This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Combine DPP mov with use instuctions (VOP1/2/3)
ClosedPublic

Authored by vpykhtin on Oct 26 2018, 7:44 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm
tpr
b-sumner
kzhuravl

Group Reviewers

Restricted Project

Commits

rG3d9afa273f94: [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
rL347993: [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)

Summary

The change adds DPP instruction pseudos and the pass combining V_MOV_B32_dpp instruction with its VALU uses as a DPP src0 operand. If any of the use instruction cannot be combined with the mov the whole sequence is reverted.

$old = ...
$dpp_value = V_MOV_B32_dpp $old, $vgpr_to_be_read_from_other_lane, dpp_controls..., $bound_ctrl
$res = VALU $dpp_value, ...

$res = VALU_DPP $folded_old, $dpp_value, ..., dpp_controls..., $folded_bound_ctrl

Combining rules:

$bound_ctrl is DPP_BOUND_ZERO, $old is any
$bound_ctrl is DPP_BOUND_OFF, $old is 0

-> $folded_old = undef, $folded_bound_ctrl = DPP_BOUND_ZERO

$bound_ctrl is DPP_BOUND_OFF, $old is undef

-> $folded_old = undef, $folded_bound_ctrl = DPP_BOUND_OFF

$bound_ctrl is DPP_BOUND_OFF, $old is foldable

-> $folded_old = folded value, $folded_bound_ctrl = DPP_BOUND_OFF

FMAC and MAC instructions doesn't have $old operand, as it already have tied $src2.

Combining into VOPC instructions ins't supported with this patch (to be added later)

Diff Detail

Repository

rL LLVM

Build Status

Buildable 24501
Build 24500: arc lint + arc unit

Event Timeline

vpykhtin created this revision.Oct 26 2018, 7:44 AM

Herald added subscribers: llvm-commits, kbarton, t-tye and 7 others. · View Herald TranscriptOct 26 2018, 7:44 AM

rampitec added inline comments.Oct 26 2018, 10:06 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
287	AMDGPU::NoRegister
295	AMDGPU::NoRegister
317	use_nodbg_operands()
322	There must be a check somewhere that use is not a SDWA or non-trivial OpSel, as these are incompatible with DPP. Also there must be a negative test for these.
lib/Target/AMDGPU/SIInstrInfo.cpp
5288	static?
5295	I think it should be a part of SIInstrInfo, not llvm namespace.
5332	Also a part of SIInstrInfo.
lib/Target/AMDGPU/SIInstrInfo.h
914	Doesn't it and getRegSubRegPair belong to SIRegisterInfo?

vpykhtin added inline comments.Oct 26 2018, 10:13 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
287	I wish I knew that before :-)
322	this is handled by by non-dpp -> dpp instuction map, see getDPPOp at line 131.
lib/Target/AMDGPU/SIInstrInfo.h
914	You're right but it rather belongs to MachineRegisterInfo. I din't put it into SIRegisterInfo so that there is no need to cast to it every time.

rampitec added inline comments.Oct 26 2018, 10:23 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
322	SDWA maybe, but you can still use it with OpSel when it is trivial (selects [lo, hi] for all operands). In anyway negative tests are needed for that.
lib/Target/AMDGPU/SIInstrInfo.h
914	They are global helpers, I do not see what to cast ;) I also agree they better fit common MRI.

vpykhtin added inline comments.Oct 26 2018, 10:45 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
322	I agree with tests, but I'm not sure I understood. There is no way to convert SDWA to DPP, right?
lib/Target/AMDGPU/SIInstrInfo.h
914	I mean MRI should be casted to SIRegiterInfo before use

rampitec added inline comments.Oct 26 2018, 11:53 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
322	SDWA not, but can be possible with opsel if it does not select a partial operand or shuffle, i.e. if it is only used as opsel because encoding requires it.
lib/Target/AMDGPU/SIInstrInfo.h
914	Ok.

Fixed per review issues.

Added checks on non-default VOP3 modifiers with negative mir test on these. Src operand modifiers are checked to have only ABS or NEG (allowed by DPP) which effectively prevents other modifiers such non-default OpSel modifiers from being combined. OpSel tests aren't included as there're no opsel pseudos yet.

vpykhtin marked 17 inline comments as done.Nov 2 2018, 7:34 AM

You do not need a special pseudo for opsel, just use VOP3 and immediate value for modifier. This is much like you did above for omod.

lib/Target/AMDGPU/GCNDPPCombine.cpp
350	AFAIR proper default value for opsel has 1 bit set (which means use high bits for high part).

arsenm added inline comments.Nov 2 2018, 10:09 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
9–10	The comments in your commit message would be more useful here
228–229	Why do you need to handle these cases? I would also use uint32_t instead of unsigned
323–324	New lines
408	Why do you need to collect a separate list of the moves in the whole function? Can you just use the dfs iterator to avoid this?

rampitec added inline comments.Nov 2 2018, 10:26 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
350	After discussion that is not the case here. You cannot process VOP3P, so you do not have op_sel_hi for which default is 1. I.e. this mask is ok, just add a test with non-zero opsel.

vpykhtin added inline comments.Nov 2 2018, 10:40 AM

lib/Target/AMDGPU/GCNDPPCombine.cpp
228–229	These cases are for the situation when bound_ctrl = 0 (result write disable), meaning that mov result would be the value of 'old' for inactive lanes. If we know the immediate for old we can calculate (is some cases) old value for the VALU operation which isn't the same as for the mov. Otherwise the combining would fail. For example: v1 = ... v10 = ... // other lane reg v0 = v_mov_b32 1 v0 = v_mov_b32_dpp v10, ..., 0 // bound_ctrl == write disable v2 = v_mul_u32_u24_e32 v0, v1 in this case we know v0 for inactive lanes would be 1 (identity for mul). This makes possible to use v1 value as the result of the mul for inactive lanes: v1 = v_mul_u32_u24_dpp v10, v1, ..., 0 // bound_ctrl == write disable v2 = v_mov_b32 v1 Othervise the combining isn't possible for bound_ctrl == write disable.
408	ok

fixed per review issues:

Added test on non-abs|neg modifiers,
replaced accumulation vector for DPP moves with reverse iteration on BB.

vpykhtin marked 7 inline comments as done.Nov 8 2018, 8:11 AM

LGTM

This revision is now accepted and ready to land.Nov 8 2018, 11:49 AM

Closed by commit rL347993: [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) (authored by vpykhtin). · Explain WhyNov 30 2018, 6:24 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPU.td

4 lines

AMDGPUTargetMachine.cpp

8 lines

AsmParser/

12 lines

1 line

424 lines

32 lines

81 lines

36 lines

30 lines

69 lines

46 lines

test/

CodeGen/

AMDGPU/

dpp_combine.ll

186 lines

dpp_combine_subregs.mir

135 lines

MC/

AMDGPU/

vop_dpp.s

1 line

Diff 172353

lib/Target/AMDGPU/AMDGPU.h

	Show All 31 Lines
	FunctionPass *createR600EmitClauseMarkers();			FunctionPass *createR600EmitClauseMarkers();
	FunctionPass *createR600ClauseMergePass();			FunctionPass *createR600ClauseMergePass();
	FunctionPass *createR600Packetizer();			FunctionPass *createR600Packetizer();
	FunctionPass *createR600ControlFlowFinalizer();			FunctionPass *createR600ControlFlowFinalizer();
	FunctionPass *createAMDGPUCFGStructurizerPass();			FunctionPass *createAMDGPUCFGStructurizerPass();
	FunctionPass createR600ISelDag(TargetMachine TM, CodeGenOpt::Level OptLevel);			FunctionPass createR600ISelDag(TargetMachine TM, CodeGenOpt::Level OptLevel);

	// SI Passes			// SI Passes
				FunctionPass *createGCNDPPCombinePass();
	FunctionPass *createSIAnnotateControlFlowPass();			FunctionPass *createSIAnnotateControlFlowPass();
	FunctionPass *createSIFoldOperandsPass();			FunctionPass *createSIFoldOperandsPass();
	FunctionPass *createSIPeepholeSDWAPass();			FunctionPass *createSIPeepholeSDWAPass();
	FunctionPass *createSILowerI1CopiesPass();			FunctionPass *createSILowerI1CopiesPass();
	FunctionPass *createSIShrinkInstructionsPass();			FunctionPass *createSIShrinkInstructionsPass();
	FunctionPass *createSILoadStoreOptimizerPass();			FunctionPass *createSILoadStoreOptimizerPass();
	FunctionPass *createSIWholeQuadModePass();			FunctionPass *createSIWholeQuadModePass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	Show All 39 Lines

	ModulePass *createAMDGPULowerKernelAttributesPass();			ModulePass *createAMDGPULowerKernelAttributesPass();
	void initializeAMDGPULowerKernelAttributesPass(PassRegistry &);			void initializeAMDGPULowerKernelAttributesPass(PassRegistry &);
	extern char &AMDGPULowerKernelAttributesID;			extern char &AMDGPULowerKernelAttributesID;

	void initializeAMDGPURewriteOutArgumentsPass(PassRegistry &);			void initializeAMDGPURewriteOutArgumentsPass(PassRegistry &);
	extern char &AMDGPURewriteOutArgumentsID;			extern char &AMDGPURewriteOutArgumentsID;

				void initializeGCNDPPCombinePass(PassRegistry &);
				extern char &GCNDPPCombineID;

	void initializeR600ClauseMergePassPass(PassRegistry &);			void initializeR600ClauseMergePassPass(PassRegistry &);
	extern char &R600ClauseMergePassID;			extern char &R600ClauseMergePassID;

	void initializeR600ControlFlowFinalizerPass(PassRegistry &);			void initializeR600ControlFlowFinalizerPass(PassRegistry &);
	extern char &R600ControlFlowFinalizerID;			extern char &R600ControlFlowFinalizerID;

	void initializeR600ExpandSpecialInstrsPassPass(PassRegistry &);			void initializeR600ExpandSpecialInstrsPassPass(PassRegistry &);
	extern char &R600ExpandSpecialInstrsPassID;			extern char &R600ExpandSpecialInstrsPassID;
	▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPU.td

	//===-- AMDGPU.td - AMDGPU Tablegen files --------- tablegen --===//			//===-- AMDGPU.td - AMDGPU Tablegen files --------- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//

	include "llvm/TableGen/SearchableTable.td"			include "llvm/TableGen/SearchableTable.td"
	include "llvm/Target/Target.td"			include "llvm/Target/Target.td"
	include "AMDGPUFeatures.td"			include "AMDGPUFeatures.td"

				class BoolToList<bit Value> {
				list<int> ret = !if(Value, [1]<int>, []<int>);
				}

	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//
	// Subtarget Features (device properties)			// Subtarget Features (device properties)
	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//

	def FeatureFastFMAF32 : SubtargetFeature<"fast-fmaf",			def FeatureFastFMAF32 : SubtargetFeature<"fast-fmaf",
	"FastFMAF32",			"FastFMAF32",
	"true",			"true",
	"Assuming f32 fma is at least as fast as mul + add"			"Assuming f32 fma is at least as fast as mul + add"
	▲ Show 20 Lines • Show All 738 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	static cl::opt<bool> EarlyInlineAll(
cl::init(false),		cl::init(false),
cl::Hidden);		cl::Hidden);

static cl::opt<bool> EnableSDWAPeephole(		static cl::opt<bool> EnableSDWAPeephole(
"amdgpu-sdwa-peephole",		"amdgpu-sdwa-peephole",
cl::desc("Enable SDWA peepholer"),		cl::desc("Enable SDWA peepholer"),
cl::init(true));		cl::init(true));

		static cl::opt<bool> EnableDPPCombine(
		"amdgpu-dpp-combine",
		cl::desc("Enable DPP combiner"),
		cl::init(false));

// Enable address space based alias analysis		// Enable address space based alias analysis
static cl::opt<bool> EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,		static cl::opt<bool> EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
cl::desc("Enable AMDGPU Alias Analysis"),		cl::desc("Enable AMDGPU Alias Analysis"),
cl::init(true));		cl::init(true));

// Option to run late CFG structurizer		// Option to run late CFG structurizer
static cl::opt<bool, true> LateCFGStructurize(		static cl::opt<bool, true> LateCFGStructurize(
"amdgpu-late-structurize",		"amdgpu-late-structurize",
Show All 36 Lines	extern "C" void LLVMInitializeAMDGPUTarget() {
PassRegistry *PR = PassRegistry::getPassRegistry();		PassRegistry *PR = PassRegistry::getPassRegistry();
initializeR600ClauseMergePassPass(*PR);		initializeR600ClauseMergePassPass(*PR);
initializeR600ControlFlowFinalizerPass(*PR);		initializeR600ControlFlowFinalizerPass(*PR);
initializeR600PacketizerPass(*PR);		initializeR600PacketizerPass(*PR);
initializeR600ExpandSpecialInstrsPassPass(*PR);		initializeR600ExpandSpecialInstrsPassPass(*PR);
initializeR600VectorRegMergerPass(*PR);		initializeR600VectorRegMergerPass(*PR);
initializeGlobalISel(*PR);		initializeGlobalISel(*PR);
initializeAMDGPUDAGToDAGISelPass(*PR);		initializeAMDGPUDAGToDAGISelPass(*PR);
		initializeGCNDPPCombinePass(*PR);
initializeSILowerI1CopiesPass(*PR);		initializeSILowerI1CopiesPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);		initializeSIFixSGPRCopiesPass(*PR);
initializeSIFixVGPRCopiesPass(*PR);		initializeSIFixVGPRCopiesPass(*PR);
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIPeepholeSDWAPass(*PR);		initializeSIPeepholeSDWAPass(*PR);
initializeSIShrinkInstructionsPass(*PR);		initializeSIShrinkInstructionsPass(*PR);
initializeSIOptimizeExecMaskingPreRAPass(*PR);		initializeSIOptimizeExecMaskingPreRAPass(*PR);
initializeSILoadStoreOptimizerPass(*PR);		initializeSILoadStoreOptimizerPass(*PR);
▲ Show 20 Lines • Show All 621 Lines • ▼ Show 20 Lines	void GCNPassConfig::addMachineSSAOptimization() {
// We want to fold operands after PeepholeOptimizer has run (or as part of		// We want to fold operands after PeepholeOptimizer has run (or as part of
// it), because it will eliminate extra copies making it easier to fold the		// it), because it will eliminate extra copies making it easier to fold the
// real source operand. We want to eliminate dead instructions after, so that		// real source operand. We want to eliminate dead instructions after, so that
// we see fewer uses of the copies. We then need to clean up the dead		// we see fewer uses of the copies. We then need to clean up the dead
// instructions leftover after the operands are folded as well.		// instructions leftover after the operands are folded as well.
//		//
// XXX - Can we get away without running DeadMachineInstructionElim again?		// XXX - Can we get away without running DeadMachineInstructionElim again?
addPass(&SIFoldOperandsID);		addPass(&SIFoldOperandsID);
		if (EnableDPPCombine)
		addPass(&GCNDPPCombineID);
addPass(&DeadMachineInstructionElimID);		addPass(&DeadMachineInstructionElimID);
addPass(&SILoadStoreOptimizerID);		addPass(&SILoadStoreOptimizerID);
if (EnableSDWAPeephole) {		if (EnableSDWAPeephole) {
addPass(&SIPeepholeSDWAID);		addPass(&SIPeepholeSDWAID);
addPass(&EarlyMachineLICMID);		addPass(&EarlyMachineLICMID);
addPass(&MachineCSEID);		addPass(&MachineCSEID);
addPass(&SIFoldOperandsID);		addPass(&SIFoldOperandsID);
addPass(&DeadMachineInstructionElimID);		addPass(&DeadMachineInstructionElimID);
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Show First 20 Lines • Show All 5,269 Lines • ▼ Show 20 Lines	void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands) {
OptionalImmIndexMap OptionalIdx;		OptionalImmIndexMap OptionalIdx;

unsigned I = 1;		unsigned I = 1;
const MCInstrDesc &Desc = MII.get(Inst.getOpcode());		const MCInstrDesc &Desc = MII.get(Inst.getOpcode());
for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {		for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {
((AMDGPUOperand &)*Operands[I++]).addRegOperands(Inst, 1);		((AMDGPUOperand &)*Operands[I++]).addRegOperands(Inst, 1);
}		}

// All DPP instructions with at least one source operand have a fake "old"
// source at the beginning that's tied to the dst operand. Handle it here.
if (Desc.getNumOperands() >= 2)
Inst.addOperand(Inst.getOperand(0));

for (unsigned E = Operands.size(); I != E; ++I) {		for (unsigned E = Operands.size(); I != E; ++I) {
		auto TiedTo = Desc.getOperandConstraint(Inst.getNumOperands(),
		MCOI::TIED_TO);
		if (TiedTo != -1) {
		assert((unsigned)TiedTo < Inst.getNumOperands());
		// handle tied old or src2 for MAC instructions
		Inst.addOperand(Inst.getOperand(TiedTo));
		}
AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[I]);		AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[I]);
// Add the register arguments		// Add the register arguments
if (Op.isReg() && Op.Reg.RegNo == AMDGPU::VCC) {		if (Op.isReg() && Op.Reg.RegNo == AMDGPU::VCC) {
// VOP2b (v_add_u32, v_sub_u32 ...) dpp use "vcc" token.		// VOP2b (v_add_u32, v_sub_u32 ...) dpp use "vcc" token.
// Skip it.		// Skip it.
continue;		continue;
} if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {		} if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {
Op.addRegWithFPInputModsOperands(Inst, 2);		Op.addRegWithFPInputModsOperands(Inst, 2);
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SIMemoryLegalizer.cpp		SIMemoryLegalizer.cpp
SIOptimizeExecMasking.cpp		SIOptimizeExecMasking.cpp
SIOptimizeExecMaskingPreRA.cpp		SIOptimizeExecMaskingPreRA.cpp
SIPeepholeSDWA.cpp		SIPeepholeSDWA.cpp
SIRegisterInfo.cpp		SIRegisterInfo.cpp
SIShrinkInstructions.cpp		SIShrinkInstructions.cpp
SIWholeQuadMode.cpp		SIWholeQuadMode.cpp
GCNILPSched.cpp		GCNILPSched.cpp
		GCNDPPCombine.cpp
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(InstPrinter)		add_subdirectory(InstPrinter)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

lib/Target/AMDGPU/GCNDPPCombine.cpp

This file was added.

				//=======- GCNDPPCombine.cpp - optimization for DPP instructions ---==========//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				// This pass combines dpp moves with the using instructions
				//===----------------------------------------------------------------------===//
				arsenmUnsubmitted Done Reply Inline Actions The comments in your commit message would be more useful here arsenm: The comments in your commit message would be more useful here

				#include "AMDGPU.h"
				#include "AMDGPUSubtarget.h"
				#include "SIInstrInfo.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineOperand.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/TargetRegisterInfo.h"
				#include "llvm/Pass.h"
				#include <cassert>

				using namespace llvm;

				#define DEBUG_TYPE "gcn-dpp-combine"

				STATISTIC(NumDPPMovsCombined, "Number of DPP moves combined.");

				namespace {

				class GCNDPPCombine : public MachineFunctionPass {
				MachineRegisterInfo *MRI;
				const SIInstrInfo *TII;

				using RegSubRegPair = TargetInstrInfo::RegSubRegPair;

				MachineOperand *getOldOpndValue(MachineOperand &OldOpnd) const;

				RegSubRegPair foldOldOpnd(MachineInstr &OrigMI,
				RegSubRegPair OldOpndVGPR,
				MachineOperand &OldOpndValue) const;

				MachineInstr *createDPPInst(MachineInstr &OrigMI,
				MachineInstr &MovMI,
				RegSubRegPair OldOpndVGPR,
				MachineOperand *OldOpnd,
				bool BoundCtrlZero) const;

				MachineInstr *createDPPInst(MachineInstr &OrigMI,
				MachineInstr &MovMI,
				RegSubRegPair OldOpndVGPR,
				bool BoundCtrlZero) const;

				bool hasNoImmOrEqual(MachineInstr &MI,
				unsigned OpndName,
				int64_t Value,
				int64_t Mask = -1) const;

				bool combineDPPMov(MachineInstr &MI) const;

				public:
				static char ID;

				GCNDPPCombine() : MachineFunctionPass(ID) {
				initializeGCNDPPCombinePass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				StringRef getPassName() const override { return "GCN DPP Combine"; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}
				};

				} // end anonymous namespace

				INITIALIZE_PASS(GCNDPPCombine, DEBUG_TYPE, "GCN DPP Combine", false, false)

				char GCNDPPCombine::ID = 0;

				char &llvm::GCNDPPCombineID = GCNDPPCombine::ID;

				FunctionPass *llvm::createGCNDPPCombinePass() {
				return new GCNDPPCombine();
				}

				static int getDPPOp(unsigned Op) {
				auto DPP32 = AMDGPU::getDPPOp32(Op);
				if (DPP32 != -1)
				return DPP32;

				auto E32 = AMDGPU::getVOPe32(Op);
				return E32 != -1 ? AMDGPU::getDPPOp32(E32) : -1;
				}

				// tracks the register operand definition and returns:
				// 1. immediate operand used to initialize the register if found
				// 2. nullptr if the register operand is undef
				// 3. the operand itself otherwise
				MachineOperand *GCNDPPCombine::getOldOpndValue(MachineOperand &OldOpnd) const {
				auto Def = getVRegSubRegDef(getRegSubRegPair(OldOpnd), MRI);
				if (!Def)
				return nullptr;

				switch(Def->getOpcode()) {
				default: break;
				case AMDGPU::IMPLICIT_DEF:
				return nullptr;
				case AMDGPU::COPY:
				case AMDGPU::V_MOV_B32_e32: {
				auto &Op1 = Def->getOperand(1);
				if (Op1.isImm())
				return &Op1;
				break;
				}
				}
				return &OldOpnd;
				}

				MachineInstr *GCNDPPCombine::createDPPInst(MachineInstr &OrigMI,
				MachineInstr &MovMI,
				RegSubRegPair OldOpndVGPR,
				bool BoundCtrlZero) const {
				assert(MovMI.getOpcode() == AMDGPU::V_MOV_B32_dpp);
				assert(TII->getNamedOperand(MovMI, AMDGPU::OpName::vdst)->getReg() ==
				TII->getNamedOperand(OrigMI, AMDGPU::OpName::src0)->getReg());

				auto OrigOp = OrigMI.getOpcode();
				auto DPPOp = getDPPOp(OrigOp);
				if (DPPOp == -1) {
				LLVM_DEBUG(dbgs() << " failed: no DPP opcode\n");
				return nullptr;
				}

				auto DPPInst = BuildMI(*OrigMI.getParent(), OrigMI,
				OrigMI.getDebugLoc(), TII->get(DPPOp));
				bool Fail = false;
				do {
				auto *Dst = TII->getNamedOperand(OrigMI, AMDGPU::OpName::vdst);
				assert(Dst);
				DPPInst.add(*Dst);
				int NumOperands = 1;

				const int OldIdx = AMDGPU::getNamedOperandIdx(DPPOp, AMDGPU::OpName::old);
				if (OldIdx != -1) {
				assert(OldIdx == NumOperands);
				assert(isOfRegClass(OldOpndVGPR, AMDGPU::VGPR_32RegClass, *MRI));
				DPPInst.addReg(OldOpndVGPR.Reg, 0, OldOpndVGPR.SubReg);
				++NumOperands;
				}

				if (auto *Mod0 = TII->getNamedOperand(OrigMI,
				AMDGPU::OpName::src0_modifiers)) {
				assert(NumOperands == AMDGPU::getNamedOperandIdx(DPPOp,
				AMDGPU::OpName::src0_modifiers));
				assert(0LL == (Mod0->getImm() & ~(SISrcMods::ABS \| SISrcMods::NEG)));
				DPPInst.addImm(Mod0->getImm());
				++NumOperands;
				}
				auto *Src0 = TII->getNamedOperand(MovMI, AMDGPU::OpName::src0);
				assert(Src0);
				if (!TII->isOperandLegal(*DPPInst.getInstr(), NumOperands, Src0)) {
				LLVM_DEBUG(dbgs() << " failed: src0 is illegal\n");
				Fail = true;
				break;
				}
				DPPInst.add(*Src0);
				++NumOperands;

				if (auto *Mod1 = TII->getNamedOperand(OrigMI,
				AMDGPU::OpName::src1_modifiers)) {
				assert(NumOperands == AMDGPU::getNamedOperandIdx(DPPOp,
				AMDGPU::OpName::src1_modifiers));
				assert(0LL == (Mod1->getImm() & ~(SISrcMods::ABS \| SISrcMods::NEG)));
				DPPInst.addImm(Mod1->getImm());
				++NumOperands;
				}
				if (auto *Src1 = TII->getNamedOperand(OrigMI, AMDGPU::OpName::src1)) {
				if (!TII->isOperandLegal(*DPPInst.getInstr(), NumOperands, Src1)) {
				LLVM_DEBUG(dbgs() << " failed: src1 is illegal\n");
				Fail = true;
				break;
				}
				DPPInst.add(*Src1);
				++NumOperands;
				}

				if (auto *Src2 = TII->getNamedOperand(OrigMI, AMDGPU::OpName::src2)) {
				if (!TII->isOperandLegal(*DPPInst.getInstr(), NumOperands, Src2)) {
				LLVM_DEBUG(dbgs() << " failed: src2 is illegal\n");
				Fail = true;
				break;
				}
				DPPInst.add(*Src2);
				}

				DPPInst.add(*TII->getNamedOperand(MovMI, AMDGPU::OpName::dpp_ctrl));
				DPPInst.add(*TII->getNamedOperand(MovMI, AMDGPU::OpName::row_mask));
				DPPInst.add(*TII->getNamedOperand(MovMI, AMDGPU::OpName::bank_mask));
				DPPInst.addImm(BoundCtrlZero ? 1 : 0);
				} while (false);

				if (Fail) {
				DPPInst.getInstr()->eraseFromParent();
				return nullptr;
				}
				LLVM_DEBUG(dbgs() << " combined: " << *DPPInst.getInstr());
				return DPPInst.getInstr();
				}

				GCNDPPCombine::RegSubRegPair
				GCNDPPCombine::foldOldOpnd(MachineInstr &OrigMI,
				RegSubRegPair OldOpndVGPR,
				MachineOperand &OldOpndValue) const {
				assert(OldOpndValue.isImm());
				switch (OrigMI.getOpcode()) {
				default: break;
				case AMDGPU::V_MAX_U32_e32:
				if (OldOpndValue.getImm() == std::numeric_limits<unsigned>::max())
				return OldOpndVGPR;
				arsenmUnsubmitted Done Reply Inline Actions Why do you need to handle these cases? I would also use uint32_t instead of unsigned arsenm: Why do you need to handle these cases? I would also use uint32_t instead of unsigned
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions These cases are for the situation when bound_ctrl = 0 (result write disable), meaning that mov result would be the value of 'old' for inactive lanes. If we know the immediate for old we can calculate (is some cases) old value for the VALU operation which isn't the same as for the mov. Otherwise the combining would fail. For example: v1 = ... v10 = ... // other lane reg v0 = v_mov_b32 1 v0 = v_mov_b32_dpp v10, ..., 0 // bound_ctrl == write disable v2 = v_mul_u32_u24_e32 v0, v1 in this case we know v0 for inactive lanes would be 1 (identity for mul). This makes possible to use v1 value as the result of the mul for inactive lanes: v1 = v_mul_u32_u24_dpp v10, v1, ..., 0 // bound_ctrl == write disable v2 = v_mov_b32 v1 Othervise the combining isn't possible for bound_ctrl == write disable. vpykhtin: These cases are for the situation when bound_ctrl = 0 (result write disable), meaning that mov…
				break;
				case AMDGPU::V_MAX_I32_e32:
				if (OldOpndValue.getImm() == std::numeric_limits<int>::max())
				return OldOpndVGPR;
				break;
				case AMDGPU::V_MIN_I32_e32:
				if (OldOpndValue.getImm() == std::numeric_limits<int>::min())
				return OldOpndVGPR;
				break;

				case AMDGPU::V_MUL_I32_I24_e32:
				case AMDGPU::V_MUL_U32_U24_e32:
				if (OldOpndValue.getImm() == 1) {
				auto *Src1 = TII->getNamedOperand(OrigMI, AMDGPU::OpName::src1);
				assert(Src1 && Src1->isReg());
				return getRegSubRegPair(*Src1);
				}
				break;
				}
				return RegSubRegPair();
				}

				// Cases to combine:
				// $bound_ctrl is DPP_BOUND_ZERO, $old is any
				// $bound_ctrl is DPP_BOUND_OFF, $old is 0
				// -> $old = undef, $bound_ctrl = DPP_BOUND_ZERO

				// $bound_ctrl is DPP_BOUND_OFF, $old is undef
				// -> $old = undef, $bound_ctrl = DPP_BOUND_OFF

				// $bound_ctrl is DPP_BOUND_OFF, $old is foldable
				// -> $old = folded value, $bound_ctrl = DPP_BOUND_OFF

				MachineInstr *GCNDPPCombine::createDPPInst(MachineInstr &OrigMI,
				MachineInstr &MovMI,
				RegSubRegPair OldOpndVGPR,
				MachineOperand *OldOpndValue,
				bool BoundCtrlZero) const {
				assert(OldOpndVGPR.Reg);
				if (!BoundCtrlZero && OldOpndValue) {
				assert(OldOpndValue->isImm());
				OldOpndVGPR = foldOldOpnd(OrigMI, OldOpndVGPR, *OldOpndValue);
				if (!OldOpndVGPR.Reg) {
				LLVM_DEBUG(dbgs() << " failed: old immediate cannot be folded\n");
				return nullptr;
				}
				}
				return createDPPInst(OrigMI, MovMI, OldOpndVGPR, BoundCtrlZero);
				}

				// returns true if MI doesn't have OpndName immediate operand or the
				// operand has Value
				bool GCNDPPCombine::hasNoImmOrEqual(MachineInstr &MI, unsigned OpndName,
				int64_t Value, int64_t Mask) const {
				auto *Imm = TII->getNamedOperand(MI, OpndName);
				if (!Imm)
				return true;

				rampitecUnsubmitted Done Reply Inline Actions AMDGPU::NoRegister rampitec: AMDGPU::NoRegister
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions I wish I knew that before :-) vpykhtin: I wish I knew that before :-)
				assert(Imm->isImm());
				return (Imm->getImm() & Mask) == Value;
				}

				bool GCNDPPCombine::combineDPPMov(MachineInstr &MovMI) const {
				assert(MovMI.getOpcode() == AMDGPU::V_MOV_B32_dpp);
				auto *BCZOpnd = TII->getNamedOperand(MovMI, AMDGPU::OpName::bound_ctrl);
				assert(BCZOpnd && BCZOpnd->isImm());
				rampitecUnsubmitted Done Reply Inline Actions AMDGPU::NoRegister rampitec: AMDGPU::NoRegister
				bool BoundCtrlZero = 0 != BCZOpnd->getImm();

				LLVM_DEBUG(dbgs() << "\nDPP combine: " << MovMI);

				auto *OldOpnd = TII->getNamedOperand(MovMI, AMDGPU::OpName::old);
				assert(OldOpnd && OldOpnd->isReg());
				auto OldOpndVGPR = getRegSubRegPair(*OldOpnd);
				auto OldOpndValue = getOldOpndValue(OldOpnd);
				assert(!OldOpndValue \|\| OldOpndValue->isImm() \|\| OldOpndValue == OldOpnd);
				if (OldOpndValue) {
				if (BoundCtrlZero) {
				OldOpndVGPR.Reg = AMDGPU::NoRegister; // should be undef, ignore old opnd
				OldOpndValue = nullptr;
				} else {
				if (!OldOpndValue->isImm()) {
				LLVM_DEBUG(dbgs() << " failed: old operand isn't an imm or undef\n");
				return false;
				}
				if (OldOpndValue->getImm() == 0) {
				OldOpndVGPR.Reg = AMDGPU::NoRegister; // should be undef
				OldOpndValue = nullptr;
				BoundCtrlZero = true;
				rampitecUnsubmitted Done Reply Inline Actions use_nodbg_operands() rampitec: use_nodbg_operands()
				}
				}
				}

				LLVM_DEBUG(dbgs() << " old=";
				rampitecUnsubmitted Done Reply Inline Actions There must be a check somewhere that use is not a SDWA or non-trivial OpSel, as these are incompatible with DPP. Also there must be a negative test for these. rampitec: There must be a check somewhere that use is not a SDWA or non-trivial OpSel, as these are…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions this is handled by by non-dpp -> dpp instuction map, see getDPPOp at line 131. vpykhtin: this is handled by by non-dpp -> dpp instuction map, see getDPPOp at line 131.
				rampitecUnsubmitted Done Reply Inline Actions SDWA maybe, but you can still use it with OpSel when it is trivial (selects [lo, hi] for all operands). In anyway negative tests are needed for that. rampitec: SDWA maybe, but you can still use it with OpSel when it is trivial (selects [lo, hi] for all…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions I agree with tests, but I'm not sure I understood. There is no way to convert SDWA to DPP, right? vpykhtin: I agree with tests, but I'm not sure I understood. There is no way to convert SDWA to DPP…
				rampitecUnsubmitted Done Reply Inline Actions SDWA not, but can be possible with opsel if it does not select a partial operand or shuffle, i.e. if it is only used as opsel because encoding requires it. rampitec: SDWA not, but can be possible with opsel if it does not select a partial operand or shuffle, i.
				if (!OldOpndValue) dbgs() << "undef";
				else dbgs() << OldOpndValue->getImm();
				arsenmUnsubmitted Not Done Reply Inline Actions New lines arsenm: New lines
				dbgs() << ", bound_ctrl=" << BoundCtrlZero << '\n');

				std::vector<MachineInstr*> OrigMIs, DPPMIs;
				if (!OldOpndVGPR.Reg) { // OldOpndVGPR = undef
				OldOpndVGPR = RegSubRegPair(
				MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass));
				auto UndefInst = BuildMI(*MovMI.getParent(), MovMI, MovMI.getDebugLoc(),
				TII->get(AMDGPU::IMPLICIT_DEF), OldOpndVGPR.Reg);
				DPPMIs.push_back(UndefInst.getInstr());
				}

				OrigMIs.push_back(&MovMI);
				bool Rollback = true;
				for (auto &Use : MRI->use_nodbg_operands(
				TII->getNamedOperand(MovMI, AMDGPU::OpName::vdst)->getReg())) {
				Rollback = true;

				auto &OrigMI = *Use.getParent();
				auto OrigOp = OrigMI.getOpcode();
				if (TII->isVOP3(OrigOp)) {
				if (!TII->hasVALU32BitEncoding(OrigOp)) {
				LLVM_DEBUG(dbgs() << " failed: VOP3 hasn't e32 equivalent\n");
				break;
				}
				// check if other than abs\|neg modifiers are set (opsel for example)
				const int64_t Mask = ~(SISrcMods::ABS \| SISrcMods::NEG);
				rampitecUnsubmitted Done Reply Inline Actions AFAIR proper default value for opsel has 1 bit set (which means use high bits for high part). rampitec: AFAIR proper default value for opsel has 1 bit set (which means use high bits for high part).
				rampitecUnsubmitted Done Reply Inline Actions After discussion that is not the case here. You cannot process VOP3P, so you do not have op_sel_hi for which default is 1. I.e. this mask is ok, just add a test with non-zero opsel. rampitec: After discussion that is not the case here. You cannot process VOP3P, so you do not have…
				if (!hasNoImmOrEqual(OrigMI, AMDGPU::OpName::src0_modifiers, 0, Mask) \|\|
				!hasNoImmOrEqual(OrigMI, AMDGPU::OpName::src1_modifiers, 0, Mask) \|\|
				!hasNoImmOrEqual(OrigMI, AMDGPU::OpName::clamp, 0) \|\|
				!hasNoImmOrEqual(OrigMI, AMDGPU::OpName::omod, 0)) {
				LLVM_DEBUG(dbgs() << " failed: VOP3 has non-default modifiers\n");
				break;
				}
				} else if (!TII->isVOP1(OrigOp) && !TII->isVOP2(OrigOp)) {
				LLVM_DEBUG(dbgs() << " failed: not VOP1/2/3\n");
				break;
				}

				LLVM_DEBUG(dbgs() << " combining: " << OrigMI);
				if (&Use == TII->getNamedOperand(OrigMI, AMDGPU::OpName::src0)) {
				if (auto *DPPInst = createDPPInst(OrigMI, MovMI, OldOpndVGPR,
				OldOpndValue, BoundCtrlZero)) {
				DPPMIs.push_back(DPPInst);
				Rollback = false;
				}
				} else if (OrigMI.isCommutable() &&
				&Use == TII->getNamedOperand(OrigMI, AMDGPU::OpName::src1)) {
				auto *BB = OrigMI.getParent();
				auto *NewMI = BB->getParent()->CloneMachineInstr(&OrigMI);
				BB->insert(OrigMI, NewMI);
				if (TII->commuteInstruction(*NewMI)) {
				LLVM_DEBUG(dbgs() << " commuted: " << *NewMI);
				if (auto DPPInst = createDPPInst(NewMI, MovMI, OldOpndVGPR,
				OldOpndValue, BoundCtrlZero)) {
				DPPMIs.push_back(DPPInst);
				Rollback = false;
				}
				} else
				LLVM_DEBUG(dbgs() << " failed: cannot be commuted\n");
				NewMI->eraseFromParent();
				} else
				LLVM_DEBUG(dbgs() << " failed: no suitable operands\n");
				if (Rollback)
				break;
				OrigMIs.push_back(&OrigMI);
				}

				for (auto MI : (Rollback? &DPPMIs : &OrigMIs))
				MI->eraseFromParent();

				return !Rollback;
				}

				bool GCNDPPCombine::runOnMachineFunction(MachineFunction &MF) {
				auto &ST = MF.getSubtarget<GCNSubtarget>();
				if (!ST.hasDPP() \|\| skipFunction(MF.getFunction()))
				return false;

				MRI = &MF.getRegInfo();
				TII = ST.getInstrInfo();

				assert(MRI->isSSA() && "Must be run on SSA");

				std::vector<MachineInstr*> DPPMoves;
				arsenmUnsubmitted Done Reply Inline Actions Why do you need to collect a separate list of the moves in the whole function? Can you just use the dfs iterator to avoid this? arsenm: Why do you need to collect a separate list of the moves in the whole function? Can you just use…
				vpykhtinAuthorUnsubmitted Done Reply Inline Actions ok vpykhtin: ok
				for (auto &MBB : MF) {
				for (auto &MI : MBB) {
				if (MI.getOpcode() == AMDGPU::V_MOV_B32_dpp)
				DPPMoves.push_back(&MI);
				}
				}

				bool Changed = false;
				for (auto *MI : DPPMoves) {
				if (combineDPPMov(*MI)) {
				Changed = true;
				++NumDPPMovsCombined;
				}
				}
				return Changed;
				}

lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 902 Lines • ▼ Show 20 Lines	public:
static bool isLegalMUBUFImmOffset(unsigned Imm) {		static bool isLegalMUBUFImmOffset(unsigned Imm) {
return isUInt<12>(Imm);		return isUInt<12>(Imm);
}		}

/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.		/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.
/// Return -1 if the target-specific opcode for the pseudo instruction does		/// Return -1 if the target-specific opcode for the pseudo instruction does
/// not exist. If Opcode is not a pseudo instruction, this is identity.		/// not exist. If Opcode is not a pseudo instruction, this is identity.
int pseudoToMCOpcode(int Opcode) const;		int pseudoToMCOpcode(int Opcode) const;

};		};

		/// \brief Returns true if a reg:subreg pair P has a TRC class
		inline bool isOfRegClass(const TargetInstrInfo::RegSubRegPair &P,
		rampitecUnsubmitted Done Reply Inline Actions Doesn't it and getRegSubRegPair belong to SIRegisterInfo? rampitec: Doesn't it and getRegSubRegPair belong to SIRegisterInfo?
		vpykhtinAuthorUnsubmitted Done Reply Inline Actions You're right but it rather belongs to MachineRegisterInfo. I din't put it into SIRegisterInfo so that there is no need to cast to it every time. vpykhtin: You're right but it rather belongs to MachineRegisterInfo. I din't put it into SIRegisterInfo…
		rampitecUnsubmitted Done Reply Inline Actions They are global helpers, I do not see what to cast ;) I also agree they better fit common MRI. rampitec: They are global helpers, I do not see what to cast ;) I also agree they better fit common MRI.
		vpykhtinAuthorUnsubmitted Done Reply Inline Actions I mean MRI should be casted to SIRegiterInfo before use vpykhtin: I mean MRI should be casted to SIRegiterInfo before use
		rampitecUnsubmitted Done Reply Inline Actions Ok. rampitec: Ok.
		const TargetRegisterClass &TRC,
		MachineRegisterInfo &MRI) {
		auto *RC = MRI.getRegClass(P.Reg);
		if (!P.SubReg)
		return RC == &TRC;
		auto *TRI = MRI.getTargetRegisterInfo();
		return RC == TRI->getMatchingSuperRegClass(RC, &TRC, P.SubReg);
		}

		/// \brief Create RegSubRegPair from a register MachineOperand
		inline
		TargetInstrInfo::RegSubRegPair getRegSubRegPair(const MachineOperand &O) {
		assert(O.isReg());
		return TargetInstrInfo::RegSubRegPair(O.getReg(), O.getSubReg());
		}

		/// \brief Return the SubReg component from REG_SEQUENCE
		TargetInstrInfo::RegSubRegPair getRegSequenceSubReg(MachineInstr &MI,
		unsigned SubReg);

		/// \brief Return the defining instruction for a given reg:subreg pair
		/// skipping copy like instructions and subreg-manipulation pseudos.
		/// Following another subreg of a reg:subreg isn't supported.
		MachineInstr *getVRegSubRegDef(const TargetInstrInfo::RegSubRegPair &P,
		MachineRegisterInfo &MRI);

namespace AMDGPU {		namespace AMDGPU {

LLVM_READONLY		LLVM_READONLY
int getVOPe64(uint16_t Opcode);		int getVOPe64(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getVOPe32(uint16_t Opcode);		int getVOPe32(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getSDWAOp(uint16_t Opcode);		int getSDWAOp(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
		int getDPPOp32(uint16_t Opcode);

		LLVM_READONLY
int getBasicFromSDWAOp(uint16_t Opcode);		int getBasicFromSDWAOp(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getCommuteRev(uint16_t Opcode);		int getCommuteRev(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getCommuteOrig(uint16_t Opcode);		int getCommuteOrig(uint16_t Opcode);

▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 5,278 Lines • ▼ Show 20 Lines	int SIInstrInfo::pseudoToMCOpcode(int Opcode) const {

// (uint16_t)-1 means that Opcode is a pseudo instruction that has		// (uint16_t)-1 means that Opcode is a pseudo instruction that has
// no encoding in the given subtarget generation.		// no encoding in the given subtarget generation.
if (MCOp == (uint16_t)-1)		if (MCOp == (uint16_t)-1)
return -1;		return -1;

return MCOp;		return MCOp;
}		}

		static
		rampitecUnsubmitted Done Reply Inline Actions static? rampitec: static?
		TargetInstrInfo::RegSubRegPair getRegOrUndef(const MachineOperand &RegOpnd) {
		assert(RegOpnd.isReg());
		return RegOpnd.isUndef() ? TargetInstrInfo::RegSubRegPair() :
		getRegSubRegPair(RegOpnd);
		}

		TargetInstrInfo::RegSubRegPair
		rampitecUnsubmitted Done Reply Inline Actions I think it should be a part of SIInstrInfo, not llvm namespace. rampitec: I think it should be a part of SIInstrInfo, not llvm namespace.
		llvm::getRegSequenceSubReg(MachineInstr &MI, unsigned SubReg) {
		assert(MI.isRegSequence());
		for (unsigned I = 0, E = (MI.getNumOperands() - 1)/ 2; I < E; ++I)
		if (MI.getOperand(1 + 2 * I + 1).getImm() == SubReg) {
		auto &RegOp = MI.getOperand(1 + 2 * I);
		return getRegOrUndef(RegOp);
		}
		return TargetInstrInfo::RegSubRegPair();
		}

		// Try to find the definition of reg:subreg in subreg-manipulation pseudos
		// Following a subreg of reg:subreg isn't supported
		static bool followSubRegDef(MachineInstr &MI,
		TargetInstrInfo::RegSubRegPair &RSR) {
		if (!RSR.SubReg)
		return false;
		switch (MI.getOpcode()) {
		default: break;
		case AMDGPU::REG_SEQUENCE:
		RSR = getRegSequenceSubReg(MI, RSR.SubReg);
		return true;
		// EXTRACT_SUBREG ins't supported as this would follow a subreg of subreg
		case AMDGPU::INSERT_SUBREG:
		if (RSR.SubReg == (unsigned)MI.getOperand(3).getImm())
		// inserted the subreg we're looking for
		RSR = getRegOrUndef(MI.getOperand(2));
		else { // the subreg in the rest of the reg
		auto R1 = getRegOrUndef(MI.getOperand(1));
		if (R1.SubReg) // subreg of subreg isn't supported
		return false;
		RSR.Reg = R1.Reg;
		}
		return true;
		}
		return false;
		}

		rampitecUnsubmitted Done Reply Inline Actions Also a part of SIInstrInfo. rampitec: Also a part of SIInstrInfo.
		MachineInstr *llvm::getVRegSubRegDef(const TargetInstrInfo::RegSubRegPair &P,
		MachineRegisterInfo &MRI) {
		assert(MRI.isSSA());
		if (!TargetRegisterInfo::isVirtualRegister(P.Reg))
		return nullptr;

		auto RSR = P;
		auto *DefInst = MRI.getVRegDef(RSR.Reg);
		while (auto *MI = DefInst) {
		DefInst = nullptr;
		switch (MI->getOpcode()) {
		case AMDGPU::COPY:
		case AMDGPU::V_MOV_B32_e32: {
		auto &Op1 = MI->getOperand(1);
		if (Op1.isReg() &&
		TargetRegisterInfo::isVirtualRegister(Op1.getReg())) {
		if (Op1.isUndef())
		return nullptr;
		RSR = getRegSubRegPair(Op1);
		DefInst = MRI.getVRegDef(RSR.Reg);
		}
		break;
		}
		default:
		if (followSubRegDef(*MI, RSR)) {
		if (!RSR.Reg)
		return nullptr;
		DefInst = MRI.getVRegDef(RSR.Reg);
		}
		}
		if (!DefInst)
		return MI;
		}
		return nullptr;
		}

lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 1,616 Lines • ▼ Show 20 Lines
class getHasExt <int NumSrcArgs, ValueType DstVT = i32, ValueType Src0VT = i32,		class getHasExt <int NumSrcArgs, ValueType DstVT = i32, ValueType Src0VT = i32,
ValueType Src1VT = i32> {		ValueType Src1VT = i32> {
bit ret = !if(!eq(NumSrcArgs, 3),		bit ret = !if(!eq(NumSrcArgs, 3),
0, // NumSrcArgs == 3 - No DPP or SDWA for VOP3		0, // NumSrcArgs == 3 - No DPP or SDWA for VOP3
!if(!eq(DstVT.Size, 64),		!if(!eq(DstVT.Size, 64),
0, // 64-bit dst - No DPP or SDWA for 64-bit operands		0, // 64-bit dst - No DPP or SDWA for 64-bit operands
!if(!eq(Src0VT.Size, 64),		!if(!eq(Src0VT.Size, 64),
0, // 64-bit src0		0, // 64-bit src0
!if(!eq(Src0VT.Size, 64),		!if(!eq(Src1VT.Size, 64),
0, // 64-bit src2		0, // 64-bit src2
1		1
)		)
)		)
)		)
);		);
}		}

		class getHasDPP <int NumSrcArgs, ValueType DstVT = i32, ValueType Src0VT = i32,
		ValueType Src1VT = i32> {
		bit ret = !if(!eq(NumSrcArgs, 0), 0,
		getHasExt<NumSrcArgs, DstVT, Src0VT, Src1VT>.ret);
		}

class BitOr<bit a, bit b> {		class BitOr<bit a, bit b> {
bit ret = !if(a, 1, !if(b, 1, 0));		bit ret = !if(a, 1, !if(b, 1, 0));
}		}

class BitAnd<bit a, bit b> {		class BitAnd<bit a, bit b> {
bit ret = !if(a, !if(b, 1, 0), 0);		bit ret = !if(a, !if(b, 1, 0), 0);
}		}

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	class VOPProfile <list<ValueType> _ArgVT> {
field bit HasHigh = 0;		field bit HasHigh = 0;

field bit IsPacked = isPackedType<Src0VT>.ret;		field bit IsPacked = isPackedType<Src0VT>.ret;
field bit HasOpSel = IsPacked;		field bit HasOpSel = IsPacked;
field bit HasOMod = !if(HasOpSel, 0, isFloatType<DstVT>.ret);		field bit HasOMod = !if(HasOpSel, 0, isFloatType<DstVT>.ret);
field bit HasSDWAOMod = isFloatType<DstVT>.ret;		field bit HasSDWAOMod = isFloatType<DstVT>.ret;

field bit HasExt = getHasExt<NumSrcArgs, DstVT, Src0VT, Src1VT>.ret;		field bit HasExt = getHasExt<NumSrcArgs, DstVT, Src0VT, Src1VT>.ret;
field bit HasExtDPP = HasExt;		field bit HasExtDPP = getHasDPP<NumSrcArgs, DstVT, Src0VT, Src1VT>.ret;
field bit HasExtSDWA = HasExt;		field bit HasExtSDWA = HasExt;
field bit HasExtSDWA9 = HasExt;		field bit HasExtSDWA9 = HasExt;
field int NeedPatGen = PatGenMode.NoPattern;		field int NeedPatGen = PatGenMode.NoPattern;

field Operand Src0PackedMod = !if(HasSrc0FloatMods, PackedF16InputMods, PackedI16InputMods);		field Operand Src0PackedMod = !if(HasSrc0FloatMods, PackedF16InputMods, PackedI16InputMods);
field Operand Src1PackedMod = !if(HasSrc1FloatMods, PackedF16InputMods, PackedI16InputMods);		field Operand Src1PackedMod = !if(HasSrc1FloatMods, PackedF16InputMods, PackedI16InputMods);
field Operand Src2PackedMod = !if(HasSrc2FloatMods, PackedF16InputMods, PackedI16InputMods);		field Operand Src2PackedMod = !if(HasSrc2FloatMods, PackedF16InputMods, PackedI16InputMods);

Show All 14 Lines	field dag InsVOP3P = getInsVOP3P<Src0RC64, Src1RC64, Src2RC64,
NumSrcArgs, HasClamp,		NumSrcArgs, HasClamp,
Src0PackedMod, Src1PackedMod, Src2PackedMod>.ret;		Src0PackedMod, Src1PackedMod, Src2PackedMod>.ret;
field dag InsVOP3OpSel = getInsVOP3OpSel<Src0RC64, Src1RC64, Src2RC64,		field dag InsVOP3OpSel = getInsVOP3OpSel<Src0RC64, Src1RC64, Src2RC64,
NumSrcArgs,		NumSrcArgs,
HasClamp,		HasClamp,
getOpSelMod<Src0VT>.ret,		getOpSelMod<Src0VT>.ret,
getOpSelMod<Src1VT>.ret,		getOpSelMod<Src1VT>.ret,
getOpSelMod<Src2VT>.ret>.ret;		getOpSelMod<Src2VT>.ret>.ret;
field dag InsDPP = getInsDPP<DstRCDPP, Src0DPP, Src1DPP, NumSrcArgs,		field dag InsDPP = !if(HasExtDPP,
HasModifiers, Src0ModDPP, Src1ModDPP>.ret;		getInsDPP<DstRCDPP, Src0DPP, Src1DPP, NumSrcArgs,
		HasModifiers, Src0ModDPP, Src1ModDPP>.ret,
		(ins));
field dag InsSDWA = getInsSDWA<Src0SDWA, Src1SDWA, NumSrcArgs,		field dag InsSDWA = getInsSDWA<Src0SDWA, Src1SDWA, NumSrcArgs,
HasSDWAOMod, Src0ModSDWA, Src1ModSDWA,		HasSDWAOMod, Src0ModSDWA, Src1ModSDWA,
DstVT>.ret;		DstVT>.ret;


field string Asm32 = getAsm32<HasDst, NumSrcArgs, DstVT>.ret;		field string Asm32 = getAsm32<HasDst, NumSrcArgs, DstVT>.ret;
field string Asm64 = getAsm64<HasDst, NumSrcArgs, HasIntClamp, HasModifiers, HasOMod, DstVT>.ret;		field string Asm64 = getAsm64<HasDst, NumSrcArgs, HasIntClamp, HasModifiers, HasOMod, DstVT>.ret;
field string AsmVOP3P = getAsmVOP3P<HasDst, NumSrcArgs, HasModifiers, HasClamp, DstVT>.ret;		field string AsmVOP3P = getAsmVOP3P<HasDst, NumSrcArgs, HasModifiers, HasClamp, DstVT>.ret;
field string AsmVOP3OpSel = getAsmVOP3OpSel<NumSrcArgs,		field string AsmVOP3OpSel = getAsmVOP3OpSel<NumSrcArgs,
HasClamp,		HasClamp,
HasSrc0FloatMods,		HasSrc0FloatMods,
HasSrc1FloatMods,		HasSrc1FloatMods,
HasSrc2FloatMods>.ret;		HasSrc2FloatMods>.ret;
field string AsmDPP = getAsmDPP<HasDst, NumSrcArgs, HasModifiers, DstVT>.ret;		field string AsmDPP = !if(HasExtDPP,
		getAsmDPP<HasDst, NumSrcArgs, HasModifiers, DstVT>.ret, "");
field string AsmSDWA = getAsmSDWA<HasDst, NumSrcArgs, DstVT>.ret;		field string AsmSDWA = getAsmSDWA<HasDst, NumSrcArgs, DstVT>.ret;
field string AsmSDWA9 = getAsmSDWA9<HasDst, HasSDWAOMod, NumSrcArgs, DstVT>.ret;		field string AsmSDWA9 = getAsmSDWA9<HasDst, HasSDWAOMod, NumSrcArgs, DstVT>.ret;
}		}

class VOP_NO_EXT <VOPProfile p> : VOPProfile <p.ArgVT> {		class VOP_NO_EXT <VOPProfile p> : VOPProfile <p.ArgVT> {
let HasExt = 0;		let HasExt = 0;
let HasExtDPP = 0;		let HasExtDPP = 0;
let HasExtSDWA = 0;		let HasExtSDWA = 0;
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
def getBasicFromSDWAOp : InstrMapping {		def getBasicFromSDWAOp : InstrMapping {
let FilterClass = "VOP";		let FilterClass = "VOP";
let RowFields = ["OpName"];		let RowFields = ["OpName"];
let ColFields = ["AsmVariantName"];		let ColFields = ["AsmVariantName"];
let KeyCol = ["SDWA"];		let KeyCol = ["SDWA"];
let ValueCols = [["Default"]];		let ValueCols = [["Default"]];
}		}

		// Maps ordinary instructions to their DPP counterparts
		def getDPPOp32 : InstrMapping {
		let FilterClass = "VOP";
		let RowFields = ["OpName"];
		let ColFields = ["AsmVariantName"];
		let KeyCol = ["Default"];
		let ValueCols = [["DPP"]];
		}
		def getDPPOp64 : InstrMapping {
		let FilterClass = "VOP";
		let RowFields = ["OpName"];
		let ColFields = ["AsmVariantName"];
		let KeyCol = ["VOP3"];
		let ValueCols = [["DPP"]];
		}


// Maps an commuted opcode to its original version		// Maps an commuted opcode to its original version
def getCommuteOrig : InstrMapping {		def getCommuteOrig : InstrMapping {
let FilterClass = "Commutable_REV";		let FilterClass = "Commutable_REV";
let RowFields = ["RevOp"];		let RowFields = ["RevOp"];
let ColFields = ["IsOrig"];		let ColFields = ["IsOrig"];
let KeyCol = ["0"];		let KeyCol = ["0"];
let ValueCols = [["1"]];		let ValueCols = [["1"]];
}		}
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP1Instructions.td

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	class VOP1_Real <VOP1_Pseudo ps, int EncodingFamily> :
let Defs = ps.Defs;		let Defs = ps.Defs;
}		}

class VOP1_SDWA_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :		class VOP1_SDWA_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :
VOP_SDWA_Pseudo <OpName, P, pattern> {		VOP_SDWA_Pseudo <OpName, P, pattern> {
let AsmMatchConverter = "cvtSdwaVOP1";		let AsmMatchConverter = "cvtSdwaVOP1";
}		}

		class VOP1_DPP_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :
		VOP_DPP_Pseudo <OpName, P, pattern> {
		}

class getVOP1Pat64 <SDPatternOperator node, VOPProfile P> : LetDummies {		class getVOP1Pat64 <SDPatternOperator node, VOPProfile P> : LetDummies {
list<dag> ret =		list<dag> ret =
!if(P.HasModifiers,		!if(P.HasModifiers,
[(set P.DstVT:$vdst, (node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0,		[(set P.DstVT:$vdst, (node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0,
i32:$src0_modifiers,		i32:$src0_modifiers,
i1:$clamp, i32:$omod))))],		i1:$clamp, i32:$omod))))],
!if(P.HasOMod,		!if(P.HasOMod,
[(set P.DstVT:$vdst, (node (P.Src0VT (VOP3OMods P.Src0VT:$src0,		[(set P.DstVT:$vdst, (node (P.Src0VT (VOP3OMods P.Src0VT:$src0,
i1:$clamp, i32:$omod))))],		i1:$clamp, i32:$omod))))],
[(set P.DstVT:$vdst, (node P.Src0VT:$src0))]		[(set P.DstVT:$vdst, (node P.Src0VT:$src0))]
)		)
);		);
}		}

multiclass VOP1Inst <string opName, VOPProfile P,		multiclass VOP1Inst <string opName, VOPProfile P,
SDPatternOperator node = null_frag> {		SDPatternOperator node = null_frag> {
def _e32 : VOP1_Pseudo <opName, P>;		def _e32 : VOP1_Pseudo <opName, P>;
def _e64 : VOP3_Pseudo <opName, P, getVOP1Pat64<node, P>.ret>;		def _e64 : VOP3_Pseudo <opName, P, getVOP1Pat64<node, P>.ret>;
def _sdwa : VOP1_SDWA_Pseudo <opName, P>;		def _sdwa : VOP1_SDWA_Pseudo <opName, P>;
		foreach _ = BoolToList<P.HasExtDPP>.ret in
		def _dpp : VOP1_DPP_Pseudo <opName, P>;
}		}

// Special profile for instructions which have clamp		// Special profile for instructions which have clamp
// and output modifiers (but have no input modifiers)		// and output modifiers (but have no input modifiers)
class VOPProfileI2F<ValueType dstVt, ValueType srcVt> :		class VOPProfileI2F<ValueType dstVt, ValueType srcVt> :
VOPProfile<[dstVt, srcVt, untyped, untyped]> {		VOPProfile<[dstVt, srcVt, untyped, untyped]> {

let Ins64 = (ins Src0RC64:$src0, clampmod:$clamp, omod:$omod);		let Ins64 = (ins Src0RC64:$src0, clampmod:$clamp, omod:$omod);
▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines
defm V_RNDNE_F64 : VOP1_Real_ci <0x19>;		defm V_RNDNE_F64 : VOP1_Real_ci <0x19>;
defm V_LOG_LEGACY_F32 : VOP1_Real_ci <0x45>;		defm V_LOG_LEGACY_F32 : VOP1_Real_ci <0x45>;
defm V_EXP_LEGACY_F32 : VOP1_Real_ci <0x46>;		defm V_EXP_LEGACY_F32 : VOP1_Real_ci <0x46>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VI		// VI
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class VOP1_DPP <bits<8> op, VOP1_Pseudo ps, VOPProfile P = ps.Pfl> :		class VOP1_DPPe <bits<8> op, VOP1_DPP_Pseudo ps, VOPProfile P = ps.Pfl> :
VOP_DPP <ps.OpName, P> {		VOP_DPPe <P> {
let Defs = ps.Defs;
let Uses = ps.Uses;
let SchedRW = ps.SchedRW;
let hasSideEffects = ps.hasSideEffects;

bits<8> vdst;		bits<8> vdst;
let Inst{8-0} = 0xfa; // dpp		let Inst{8-0} = 0xfa; // dpp
let Inst{16-9} = op;		let Inst{16-9} = op;
let Inst{24-17} = !if(P.EmitDst, vdst{7-0}, 0);		let Inst{24-17} = !if(P.EmitDst, vdst{7-0}, 0);
let Inst{31-25} = 0x3f; //encoding		let Inst{31-25} = 0x3f; //encoding
}		}

multiclass VOP1Only_Real_vi <bits<10> op> {		multiclass VOP1Only_Real_vi <bits<10> op> {
Show All 21 Lines	multiclass VOP1_Real_vi <bits<10> op> {
def _sdwa_vi :		def _sdwa_vi :
VOP_SDWA_Real <!cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa")>,		VOP_SDWA_Real <!cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa")>,
VOP1_SDWAe <op{7-0}, !cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa").Pfl>;		VOP1_SDWAe <op{7-0}, !cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa").Pfl>;

def _sdwa_gfx9 :		def _sdwa_gfx9 :
VOP_SDWA9_Real <!cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa")>,		VOP_SDWA9_Real <!cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa")>,
VOP1_SDWA9Ae <op{7-0}, !cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa").Pfl>;		VOP1_SDWA9Ae <op{7-0}, !cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa").Pfl>;

// For now left dpp only for asm/dasm		foreach _ = BoolToList<!cast<VOP1_Pseudo>(NAME#"_e32").Pfl.HasExtDPP>.ret in
// TODO: add corresponding pseudo		def _dpp_vi :
def _dpp : VOP1_DPP<op{7-0}, !cast<VOP1_Pseudo>(NAME#"_e32")>;		VOP_DPP_Real<!cast<VOP1_DPP_Pseudo>(NAME#"_dpp"), SIEncodingFamily.VI>,
		VOP1_DPPe<op{7-0}, !cast<VOP1_DPP_Pseudo>(NAME#"_dpp")>;
}		}

defm V_NOP : VOP1_Real_vi <0x0>;		defm V_NOP : VOP1_Real_vi <0x0>;
defm V_MOV_B32 : VOP1_Real_vi <0x1>;		defm V_MOV_B32 : VOP1_Real_vi <0x1>;
defm V_CVT_I32_F64 : VOP1_Real_vi <0x3>;		defm V_CVT_I32_F64 : VOP1_Real_vi <0x3>;
defm V_CVT_F64_I32 : VOP1_Real_vi <0x4>;		defm V_CVT_F64_I32 : VOP1_Real_vi <0x4>;
defm V_CVT_F32_I32 : VOP1_Real_vi <0x5>;		defm V_CVT_F32_I32 : VOP1_Real_vi <0x5>;
defm V_CVT_F32_U32 : VOP1_Real_vi <0x6>;		defm V_CVT_F32_U32 : VOP1_Real_vi <0x6>;
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	multiclass VOP1_Real_gfx9 <bits<10> op> {
let AssemblerPredicates = [isGFX9], DecoderNamespace = "GFX9" in {		let AssemblerPredicates = [isGFX9], DecoderNamespace = "GFX9" in {
defm NAME : VOP1_Real_e32e64_vi <op>;		defm NAME : VOP1_Real_e32e64_vi <op>;
}		}

def _sdwa_gfx9 :		def _sdwa_gfx9 :
VOP_SDWA9_Real <!cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa")>,		VOP_SDWA9_Real <!cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa")>,
VOP1_SDWA9Ae <op{7-0}, !cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa").Pfl>;		VOP1_SDWA9Ae <op{7-0}, !cast<VOP1_SDWA_Pseudo>(NAME#"_sdwa").Pfl>;

// For now left dpp only for asm/dasm		foreach _ = BoolToList<!cast<VOP1_Pseudo>(NAME#"_e32").Pfl.HasExtDPP>.ret in
// TODO: add corresponding pseudo		def _dpp_gfx9 :
def _dpp : VOP1_DPP<op{7-0}, !cast<VOP1_Pseudo>(NAME#"_e32")>;		VOP_DPP_Real<!cast<VOP1_DPP_Pseudo>(NAME#"_dpp"), SIEncodingFamily.GFX9>,
		VOP1_DPPe<op{7-0}, !cast<VOP1_DPP_Pseudo>(NAME#"_dpp")>;

}		}

defm V_SCREEN_PARTITION_4SE_B32 : VOP1_Real_gfx9 <0x37>;		defm V_SCREEN_PARTITION_4SE_B32 : VOP1_Real_gfx9 <0x37>;

lib/Target/AMDGPU/VOP2Instructions.td

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	class VOP2_Real <VOP2_Pseudo ps, int EncodingFamily> :
let Defs = ps.Defs;		let Defs = ps.Defs;
}		}

class VOP2_SDWA_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :		class VOP2_SDWA_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :
VOP_SDWA_Pseudo <OpName, P, pattern> {		VOP_SDWA_Pseudo <OpName, P, pattern> {
let AsmMatchConverter = "cvtSdwaVOP2";		let AsmMatchConverter = "cvtSdwaVOP2";
}		}

		class VOP2_DPP_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :
		VOP_DPP_Pseudo <OpName, P, pattern> {
		}


class getVOP2Pat64 <SDPatternOperator node, VOPProfile P> : LetDummies {		class getVOP2Pat64 <SDPatternOperator node, VOPProfile P> : LetDummies {
list<dag> ret = !if(P.HasModifiers,		list<dag> ret = !if(P.HasModifiers,
[(set P.DstVT:$vdst,		[(set P.DstVT:$vdst,
(node (P.Src0VT		(node (P.Src0VT
!if(P.HasOMod,		!if(P.HasOMod,
(VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers, i1:$clamp, i32:$omod),		(VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers, i1:$clamp, i32:$omod),
(VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers, i1:$clamp))),		(VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers, i1:$clamp))),
(P.Src1VT (VOP3Mods P.Src1VT:$src1, i32:$src1_modifiers))))],		(P.Src1VT (VOP3Mods P.Src1VT:$src1, i32:$src1_modifiers))))],
Show All 34 Lines

multiclass VOP2Inst<string opName,		multiclass VOP2Inst<string opName,
VOPProfile P,		VOPProfile P,
SDPatternOperator node = null_frag,		SDPatternOperator node = null_frag,
string revOp = opName,		string revOp = opName,
bit GFX9Renamed = 0> :		bit GFX9Renamed = 0> :
VOP2Inst_e32<opName, P, node, revOp, GFX9Renamed>,		VOP2Inst_e32<opName, P, node, revOp, GFX9Renamed>,
VOP2Inst_e64<opName, P, node, revOp, GFX9Renamed>,		VOP2Inst_e64<opName, P, node, revOp, GFX9Renamed>,
VOP2Inst_sdwa<opName, P, node, revOp, GFX9Renamed>;		VOP2Inst_sdwa<opName, P, node, revOp, GFX9Renamed> {
		let renamedInGFX9 = GFX9Renamed in {
		foreach _ = BoolToList<P.HasExtDPP>.ret in
		def _dpp : VOP2_DPP_Pseudo <opName, P>;
		}
		}

multiclass VOP2bInst <string opName,		multiclass VOP2bInst <string opName,
VOPProfile P,		VOPProfile P,
SDPatternOperator node = null_frag,		SDPatternOperator node = null_frag,
string revOp = opName,		string revOp = opName,
bit GFX9Renamed = 0,		bit GFX9Renamed = 0,
bit useSGPRInput = !eq(P.NumSrcArgs, 3)> {		bit useSGPRInput = !eq(P.NumSrcArgs, 3)> {
let renamedInGFX9 = GFX9Renamed in {		let renamedInGFX9 = GFX9Renamed in {
let SchedRW = [Write32Bit, WriteSALU] in {		let SchedRW = [Write32Bit, WriteSALU] in {
let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {		let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {
def _e32 : VOP2_Pseudo <opName, P, VOPPatOrNull<node,P>.ret>,		def _e32 : VOP2_Pseudo <opName, P, VOPPatOrNull<node,P>.ret>,
Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;

def _sdwa : VOP2_SDWA_Pseudo <opName, P> {		def _sdwa : VOP2_SDWA_Pseudo <opName, P> {
let AsmMatchConverter = "cvtSdwaVOP2b";		let AsmMatchConverter = "cvtSdwaVOP2b";
}		}
		foreach _ = BoolToList<P.HasExtDPP>.ret in
		def _dpp : VOP2_DPP_Pseudo <opName, P>;
}		}

def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,		def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,
Commutable_REV<revOp#"_e64", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e64", !eq(revOp, opName)>;
}		}
}		}
}		}

multiclass VOP2eInst <string opName,		multiclass VOP2eInst <string opName,
VOPProfile P,		VOPProfile P,
SDPatternOperator node = null_frag,		SDPatternOperator node = null_frag,
string revOp = opName,		string revOp = opName,
bit useSGPRInput = !eq(P.NumSrcArgs, 3)> {		bit useSGPRInput = !eq(P.NumSrcArgs, 3)> {

let SchedRW = [Write32Bit] in {		let SchedRW = [Write32Bit] in {
let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]) in {		let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]) in {
def _e32 : VOP2_Pseudo <opName, P>,		def _e32 : VOP2_Pseudo <opName, P>,
Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;

def _sdwa : VOP2_SDWA_Pseudo <opName, P> {		def _sdwa : VOP2_SDWA_Pseudo <opName, P> {
let AsmMatchConverter = "cvtSdwaVOP2b";		let AsmMatchConverter = "cvtSdwaVOP2b";
}		}

		foreach _ = BoolToList<P.HasExtDPP>.ret in
		def _dpp : VOP2_DPP_Pseudo <opName, P>;
}		}

def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,		def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,
Commutable_REV<revOp#"_e64", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e64", !eq(revOp, opName)>;
}		}
}		}

class VOP_MADAK <ValueType vt> : VOPProfile <[vt, vt, vt, vt]> {		class VOP_MADAK <ValueType vt> : VOPProfile <[vt, vt, vt, vt]> {
Show All 23 Lines
def VOP_MADMK_F32 : VOP_MADMK <f32>;		def VOP_MADMK_F32 : VOP_MADMK <f32>;

// FIXME: Remove src2_modifiers. It isn't used, so is wasting memory		// FIXME: Remove src2_modifiers. It isn't used, so is wasting memory
// and processing time but it makes it easier to convert to mad.		// and processing time but it makes it easier to convert to mad.
class VOP_MAC <ValueType vt> : VOPProfile <[vt, vt, vt, vt]> {		class VOP_MAC <ValueType vt> : VOPProfile <[vt, vt, vt, vt]> {
let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1, VGPR_32:$src2);		let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1, VGPR_32:$src2);
let Ins64 = getIns64<Src0RC64, Src1RC64, RegisterOperand<VGPR_32>, 3,		let Ins64 = getIns64<Src0RC64, Src1RC64, RegisterOperand<VGPR_32>, 3,
0, HasModifiers, HasOMod, Src0Mod, Src1Mod, Src2Mod>.ret;		0, HasModifiers, HasOMod, Src0Mod, Src1Mod, Src2Mod>.ret;
let InsDPP = (ins DstRCDPP:$old,		let InsDPP = (ins Src0ModDPP:$src0_modifiers, Src0DPP:$src0,
Src0ModDPP:$src0_modifiers, Src0DPP:$src0,
Src1ModDPP:$src1_modifiers, Src1DPP:$src1,		Src1ModDPP:$src1_modifiers, Src1DPP:$src1,
		VGPR_32:$src2, // stub argument
dpp_ctrl:$dpp_ctrl, row_mask:$row_mask,		dpp_ctrl:$dpp_ctrl, row_mask:$row_mask,
bank_mask:$bank_mask, bound_ctrl:$bound_ctrl);		bank_mask:$bank_mask, bound_ctrl:$bound_ctrl);

let InsSDWA = (ins Src0ModSDWA:$src0_modifiers, Src0SDWA:$src0,		let InsSDWA = (ins Src0ModSDWA:$src0_modifiers, Src0SDWA:$src0,
Src1ModSDWA:$src1_modifiers, Src1SDWA:$src1,		Src1ModSDWA:$src1_modifiers, Src1SDWA:$src1,
VGPR_32:$src2, // stub argument		VGPR_32:$src2, // stub argument
clampmod:$clamp, omod:$omod,		clampmod:$clamp, omod:$omod,
dst_sel:$dst_sel, dst_unused:$dst_unused,		dst_sel:$dst_sel, dst_unused:$dst_unused,
▲ Show 20 Lines • Show All 526 Lines • ▼ Show 20 Lines
defm V_CVT_PK_U16_U32 : VOP2_Real_e32e64_si <0x30>;		defm V_CVT_PK_U16_U32 : VOP2_Real_e32e64_si <0x30>;
defm V_CVT_PK_I16_I32 : VOP2_Real_e32e64_si <0x31>;		defm V_CVT_PK_I16_I32 : VOP2_Real_e32e64_si <0x31>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VI		// VI
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class VOP2_DPP <bits<6> op, VOP2_Pseudo ps, string OpName = ps.OpName, VOPProfile P = ps.Pfl> :		class VOP2_DPPe <bits<6> op, VOP2_DPP_Pseudo ps, VOPProfile P = ps.Pfl> :
VOP_DPP <OpName, P> {		VOP_DPPe <P> {
let Defs = ps.Defs;
let Uses = ps.Uses;
let SchedRW = ps.SchedRW;
let hasSideEffects = ps.hasSideEffects;

bits<8> vdst;		bits<8> vdst;
bits<8> src1;		bits<8> src1;
let Inst{8-0} = 0xfa; //dpp		let Inst{8-0} = 0xfa; //dpp
let Inst{16-9} = !if(P.HasSrc1, src1{7-0}, 0);		let Inst{16-9} = !if(P.HasSrc1, src1{7-0}, 0);
let Inst{24-17} = !if(P.EmitDst, vdst{7-0}, 0);		let Inst{24-17} = !if(P.EmitDst, vdst{7-0}, 0);
let Inst{30-25} = op;		let Inst{30-25} = op;
let Inst{31} = 0x0; //encoding		let Inst{31} = 0x0; //encoding
}		}
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	VOP3be_vi <{0, 1, 0, 0, op{5-0}}, !cast<VOP3_Pseudo>(OpName#"_e64").Pfl> {
let DecoderNamespace = "VI";		let DecoderNamespace = "VI";
}		}
def _sdwa_vi :		def _sdwa_vi :
VOP_SDWA_Real <!cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa")>,		VOP_SDWA_Real <!cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa")>,
VOP2_SDWAe <op{5-0}, !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa").Pfl> {		VOP2_SDWAe <op{5-0}, !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa").Pfl> {
VOP2_SDWA_Pseudo ps = !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa");		VOP2_SDWA_Pseudo ps = !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa");
let AsmString = AsmName # ps.AsmOperands;		let AsmString = AsmName # ps.AsmOperands;
}		}
def _dpp :		foreach _ = BoolToList<!cast<VOP2_Pseudo>(OpName#"_e32").Pfl.HasExtDPP>.ret in
VOP2_DPP<op, !cast<VOP2_Pseudo>(OpName#"_e32"), AsmName>;		def _dpp_vi :
		VOP_DPP_Real<!cast<VOP2_DPP_Pseudo>(OpName#"_dpp"), SIEncodingFamily.VI>,
		VOP2_DPPe<op, !cast<VOP2_DPP_Pseudo>(OpName#"_dpp")> {
		VOP2_DPP_Pseudo ps = !cast<VOP2_DPP_Pseudo>(OpName#"_dpp");
		let AsmString = AsmName # ps.AsmOperands;
		}
}		}
}		}

let AssemblerPredicates = [isGFX9] in {		let AssemblerPredicates = [isGFX9] in {

multiclass VOP2be_Real_e32e64_gfx9 <bits<6> op, string OpName, string AsmName> {		multiclass VOP2be_Real_e32e64_gfx9 <bits<6> op, string OpName, string AsmName> {
def _e32_gfx9 :		def _e32_gfx9 :
VOP2_Real<!cast<VOP2_Pseudo>(OpName#"_e32"), SIEncodingFamily.GFX9>,		VOP2_Real<!cast<VOP2_Pseudo>(OpName#"_e32"), SIEncodingFamily.GFX9>,
Show All 10 Lines	VOP3be_vi <{0, 1, 0, 0, op{5-0}}, !cast<VOP3_Pseudo>(OpName#"_e64").Pfl> {
let DecoderNamespace = "GFX9";		let DecoderNamespace = "GFX9";
}		}
def _sdwa_gfx9 :		def _sdwa_gfx9 :
VOP_SDWA9_Real <!cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa")>,		VOP_SDWA9_Real <!cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa")>,
VOP2_SDWA9Ae <op{5-0}, !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa").Pfl> {		VOP2_SDWA9Ae <op{5-0}, !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa").Pfl> {
VOP2_SDWA_Pseudo ps = !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa");		VOP2_SDWA_Pseudo ps = !cast<VOP2_SDWA_Pseudo>(OpName#"_sdwa");
let AsmString = AsmName # ps.AsmOperands;		let AsmString = AsmName # ps.AsmOperands;
}		}
		foreach _ = BoolToList<!cast<VOP2_Pseudo>(OpName#"_e32").Pfl.HasExtDPP>.ret in
def _dpp_gfx9 :		def _dpp_gfx9 :
VOP2_DPP<op, !cast<VOP2_Pseudo>(OpName#"_e32"), AsmName> {		VOP_DPP_Real<!cast<VOP2_DPP_Pseudo>(OpName#"_dpp"), SIEncodingFamily.GFX9>,
		VOP2_DPPe<op, !cast<VOP2_DPP_Pseudo>(OpName#"_dpp")> {
		VOP2_DPP_Pseudo ps = !cast<VOP2_DPP_Pseudo>(OpName#"_dpp");
		let AsmString = AsmName # ps.AsmOperands;
let DecoderNamespace = "SDWA9";		let DecoderNamespace = "SDWA9";
}		}
}		}

multiclass VOP2_Real_e32e64_gfx9 <bits<6> op> {		multiclass VOP2_Real_e32e64_gfx9 <bits<6> op> {
def _e32_gfx9 :		def _e32_gfx9 :
VOP2_Real<!cast<VOP2_Pseudo>(NAME#"_e32"), SIEncodingFamily.GFX9>,		VOP2_Real<!cast<VOP2_Pseudo>(NAME#"_e32"), SIEncodingFamily.GFX9>,
VOP2e<op{5-0}, !cast<VOP2_Pseudo>(NAME#"_e32").Pfl>{		VOP2e<op{5-0}, !cast<VOP2_Pseudo>(NAME#"_e32").Pfl>{
let DecoderNamespace = "GFX9";		let DecoderNamespace = "GFX9";
}		}
def _e64_gfx9 :		def _e64_gfx9 :
VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.GFX9>,		VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.GFX9>,
VOP3e_vi <{0, 1, 0, 0, op{5-0}}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl> {		VOP3e_vi <{0, 1, 0, 0, op{5-0}}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl> {
let DecoderNamespace = "GFX9";		let DecoderNamespace = "GFX9";
}		}
def _sdwa_gfx9 :		def _sdwa_gfx9 :
VOP_SDWA9_Real <!cast<VOP2_SDWA_Pseudo>(NAME#"_sdwa")>,		VOP_SDWA9_Real <!cast<VOP2_SDWA_Pseudo>(NAME#"_sdwa")>,
VOP2_SDWA9Ae <op{5-0}, !cast<VOP2_SDWA_Pseudo>(NAME#"_sdwa").Pfl> {		VOP2_SDWA9Ae <op{5-0}, !cast<VOP2_SDWA_Pseudo>(NAME#"_sdwa").Pfl> {
}		}
		foreach _ = BoolToList<!cast<VOP2_Pseudo>(NAME#"_e32").Pfl.HasExtDPP>.ret in
def _dpp_gfx9 :		def _dpp_gfx9 :
VOP2_DPP<op, !cast<VOP2_Pseudo>(NAME#"_e32")> {		VOP_DPP_Real<!cast<VOP2_DPP_Pseudo>(NAME#"_dpp"), SIEncodingFamily.GFX9>,
		VOP2_DPPe<op, !cast<VOP2_DPP_Pseudo>(NAME#"_dpp")> {
let DecoderNamespace = "SDWA9";		let DecoderNamespace = "SDWA9";
}		}
}		}

} // AssemblerPredicates = [isGFX9]		} // AssemblerPredicates = [isGFX9]

multiclass VOP2_Real_e32e64_vi <bits<6> op> :		multiclass VOP2_Real_e32e64_vi <bits<6> op> :
Base_VOP2_Real_e32e64_vi<op>, VOP2_SDWA_Real<op>, VOP2_SDWA9_Real<op> {		Base_VOP2_Real_e32e64_vi<op>, VOP2_SDWA_Real<op>, VOP2_SDWA9_Real<op> {
// For now left dpp only for asm/dasm
// TODO: add corresponding pseudo		foreach _ = BoolToList<!cast<VOP2_Pseudo>(NAME#"_e32").Pfl.HasExtDPP>.ret in
def _dpp : VOP2_DPP<op, !cast<VOP2_Pseudo>(NAME#"_e32")>;		def _dpp_vi :
		VOP_DPP_Real<!cast<VOP2_DPP_Pseudo>(NAME#"_dpp"), SIEncodingFamily.VI>,
		VOP2_DPPe<op, !cast<VOP2_DPP_Pseudo>(NAME#"_dpp")>;
}		}

defm V_CNDMASK_B32 : VOP2_Real_e32e64_vi <0x0>;		defm V_CNDMASK_B32 : VOP2_Real_e32e64_vi <0x0>;
defm V_ADD_F32 : VOP2_Real_e32e64_vi <0x1>;		defm V_ADD_F32 : VOP2_Real_e32e64_vi <0x1>;
defm V_SUB_F32 : VOP2_Real_e32e64_vi <0x2>;		defm V_SUB_F32 : VOP2_Real_e32e64_vi <0x2>;
defm V_SUBREV_F32 : VOP2_Real_e32e64_vi <0x3>;		defm V_SUBREV_F32 : VOP2_Real_e32e64_vi <0x3>;
defm V_MUL_LEGACY_F32 : VOP2_Real_e32e64_vi <0x4>;		defm V_MUL_LEGACY_F32 : VOP2_Real_e32e64_vi <0x4>;
defm V_MUL_F32 : VOP2_Real_e32e64_vi <0x5>;		defm V_MUL_F32 : VOP2_Real_e32e64_vi <0x5>;
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOPInstructions.td

Show First 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	class VOP_DPPe<VOPProfile P> : Enc64 {
let Inst{52} = !if(P.HasSrc0Mods, src0_modifiers{0}, 0); // src0_neg		let Inst{52} = !if(P.HasSrc0Mods, src0_modifiers{0}, 0); // src0_neg
let Inst{53} = !if(P.HasSrc0Mods, src0_modifiers{1}, 0); // src0_abs		let Inst{53} = !if(P.HasSrc0Mods, src0_modifiers{1}, 0); // src0_abs
let Inst{54} = !if(P.HasSrc1Mods, src1_modifiers{0}, 0); // src1_neg		let Inst{54} = !if(P.HasSrc1Mods, src1_modifiers{0}, 0); // src1_neg
let Inst{55} = !if(P.HasSrc1Mods, src1_modifiers{1}, 0); // src1_abs		let Inst{55} = !if(P.HasSrc1Mods, src1_modifiers{1}, 0); // src1_abs
let Inst{59-56} = bank_mask;		let Inst{59-56} = bank_mask;
let Inst{63-60} = row_mask;		let Inst{63-60} = row_mask;
}		}

class VOP_DPP <string OpName, VOPProfile P> :		class VOP_DPP_Pseudo <string OpName, VOPProfile P, list<dag> pattern=[]> :
InstSI <P.OutsDPP, P.InsDPP, OpName#P.AsmDPP, []>,		InstSI <P.OutsDPP, P.InsDPP, OpName#P.AsmDPP, pattern>,
VOP_DPPe<P> {		VOP <OpName>,
		SIMCInstr <OpName#"_dpp", SIEncodingFamily.NONE>,
		MnemonicAlias <OpName#"_dpp", OpName> {

		let isPseudo = 1;
		let isCodeGenOnly = 1;

let mayLoad = 0;		let mayLoad = 0;
let mayStore = 0;		let mayStore = 0;
let hasSideEffects = 0;		let hasSideEffects = 0;
let UseNamedOperandTable = 1;		let UseNamedOperandTable = 1;

let VALU = 1;		let VALU = 1;
let DPP = 1;		let DPP = 1;
let Size = 8;		let Size = 8;
		let Uses = [EXEC];
		let isConvergent = 1;

		string Mnemonic = OpName;
		string AsmOperands = P.AsmDPP;

let AsmMatchConverter = !if(!eq(P.HasModifiers,1), "cvtDPP", "");		let AsmMatchConverter = !if(!eq(P.HasModifiers,1), "cvtDPP", "");
let SubtargetPredicate = HasDPP;		let SubtargetPredicate = HasDPP;
let AssemblerPredicate = !if(P.HasExtDPP, HasDPP, DisableInst);		let AssemblerPredicate = !if(P.HasExtDPP, HasDPP, DisableInst);
let AsmVariantName = !if(P.HasExtDPP, AMDGPUAsmVariants.DPP,		let AsmVariantName = !if(P.HasExtDPP, AMDGPUAsmVariants.DPP,
AMDGPUAsmVariants.Disable);		AMDGPUAsmVariants.Disable);
let Constraints = !if(P.NumSrcArgs, "$old = $vdst", "");		let Constraints = !if(P.NumSrcArgs, "$old = $vdst", "");
let DisableEncoding = !if(P.NumSrcArgs, "$old", "");		let DisableEncoding = !if(P.NumSrcArgs, "$old", "");
let DecoderNamespace = "DPP";		let DecoderNamespace = "DPP";

		VOPProfile Pfl = P;
		}

		class VOP_DPP_Real <VOP_DPP_Pseudo ps, int EncodingFamily> :
		InstSI <ps.OutOperandList, ps.InOperandList, ps.Mnemonic # ps.AsmOperands, []>,
		SIMCInstr <ps.PseudoInstr, EncodingFamily> {

		let isPseudo = 0;
		let isCodeGenOnly = 0;

		let Defs = ps.Defs;
		let Uses = ps.Uses;
		let SchedRW = ps.SchedRW;
		let hasSideEffects = ps.hasSideEffects;

		let Constraints = ps.Constraints;
		let DisableEncoding = ps.DisableEncoding;

		// Copy relevant pseudo op flags
		let isConvergent = ps.isConvergent;
		let SubtargetPredicate = ps.SubtargetPredicate;
		let AssemblerPredicate = ps.AssemblerPredicate;
		let AsmMatchConverter = ps.AsmMatchConverter;
		let AsmVariantName = ps.AsmVariantName;
		let UseNamedOperandTable = ps.UseNamedOperandTable;
		let DecoderNamespace = ps.DecoderNamespace;
		let Constraints = ps.Constraints;
		let DisableEncoding = ps.DisableEncoding;
		let TSFlags = ps.TSFlags;
}		}

class getNumNodeArgs<SDPatternOperator Op> {		class getNumNodeArgs<SDPatternOperator Op> {
SDNode N = !cast<SDNode>(Op);		SDNode N = !cast<SDNode>(Op);
SDTypeProfile TP = N.TypeProfile;		SDTypeProfile TP = N.TypeProfile;
int ret = TP.NumOperands;		int ret = TP.NumOperands;
}		}

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/dpp_combine.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tonga -amdgpu-dpp-combine -verify-machineinstrs < %s \| FileCheck %s

				; VOP2 with literal cannot be combined
				; CHECK-LABEL: {{^}}dpp_combine_i32_literal:
				; CHECK: v_mov_b32_dpp [[OLD:v[0-9]+]], {{v[0-9]+}} quad_perm:[1,0,0,0] row_mask:0x2 bank_mask:0x1 bound_ctrl:0
				; CHECK: v_add_u32_e32 {{v[0-9]+}}, vcc, 42, [[OLD]]
				define amdgpu_kernel void @dpp_combine_i32_literal(i32 addrspace(1)* %out, i32 %in) {
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 1, i32 2, i32 1, i1 1) #0
				%res = add nsw i32 %dpp, 42
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_bz:
				; CHECK: v_add_u32_dpp {{v[0-9]+}}, vcc, {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_i32_bz(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 1, i32 1, i32 1, i1 1) #0
				%res = add nsw i32 %dpp, %x
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_boff_undef:
				; CHECK: v_add_u32_dpp {{v[0-9]+}}, vcc, {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1
				define amdgpu_kernel void @dpp_combine_i32_boff_undef(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 1, i32 1, i32 1, i1 0) #0
				%res = add nsw i32 %dpp, %x
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_boff_0:
				; CHECK: v_add_u32_dpp {{v[0-9]+}}, vcc, {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_i32_boff_0(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %in, i32 1, i32 1, i32 1, i1 0) #0
				%res = add nsw i32 %dpp, %x
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_boff_max:
				; CHECK: v_bfrev_b32_e32 [[OLD:v[0-9]+]], -2
				; CHECK: v_max_i32_dpp [[OLD]], {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1
				define amdgpu_kernel void @dpp_combine_i32_boff_max(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 2147483647, i32 %in, i32 1, i32 1, i32 1, i1 0) #0
				%cmp = icmp sge i32 %dpp, %x
				%res = select i1 %cmp, i32 %dpp, i32 %x
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_boff_min:
				; CHECK: v_bfrev_b32_e32 [[OLD:v[0-9]+]], 1
				; CHECK: v_min_i32_dpp [[OLD]], {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1
				define amdgpu_kernel void @dpp_combine_i32_boff_min(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 -2147483648, i32 %in, i32 1, i32 1, i32 1, i1 0) #0
				%cmp = icmp sle i32 %dpp, %x
				%res = select i1 %cmp, i32 %dpp, i32 %x
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_boff_mul:
				; CHECK: v_mul_i32_i24_dpp v0, v3, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1
				define amdgpu_kernel void @dpp_combine_i32_boff_mul(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 1, i32 %in, i32 1, i32 1, i32 1, i1 0) #0

				%dpp.shl = shl i32 %dpp, 8
				%dpp.24 = ashr i32 %dpp.shl, 8
				%x.shl = shl i32 %x, 8
				%x.24 = ashr i32 %x.shl, 8
				%res = mul i32 %dpp.24, %x.24
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_i32_commute:
				; CHECK: v_subrev_u32_dpp {{v[0-9]+}}, vcc, {{v[0-9]+}}, v0 quad_perm:[2,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_i32_commute(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 2, i32 1, i32 1, i1 1) #0
				%res = sub nsw i32 %x, %dpp
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_f32:
				; CHECK: v_add_f32_dpp {{v[0-9]+}}, {{v[0-9]+}}, v0 quad_perm:[3,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_f32(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()

				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 3, i32 1, i32 1, i1 1) #0
				%dpp.f32 = bitcast i32 %dpp to float
				%x.f32 = bitcast i32 %x to float
				%res.f32 = fadd float %x.f32, %dpp.f32
				%res = bitcast float %res.f32 to i32
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_test_f32_mods:
				; CHECK: v_mul_f32_dpp {{v[0-9]+}}, \|{{v[0-9]+}}\|, -v0 quad_perm:[0,1,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_test_f32_mods(i32 addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()

				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 4, i32 1, i32 1, i1 1) #0

				%x.f32 = bitcast i32 %x to float
				%x.f32.neg = fsub float -0.000000e+00, %x.f32

				%dpp.f32 = bitcast i32 %dpp to float
				%dpp.f32.cmp = fcmp fast olt float %dpp.f32, 0.000000e+00
				%dpp.f32.sign = select i1 %dpp.f32.cmp, float -1.000000e+00, float 1.000000e+00
				%dpp.f32.abs = fmul fast float %dpp.f32, %dpp.f32.sign

				%res.f32 = fmul float %x.f32.neg, %dpp.f32.abs
				%res = bitcast float %res.f32 to i32
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_mac:
				; CHECK: v_mac_f32_dpp v0, {{v[0-9]+}}, v1 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_mac(float addrspace(1)* %out, i32 %in) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%y = tail call i32 @llvm.amdgcn.workitem.id.y()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 1, i32 1, i32 1, i1 1) #0
				%dpp.f32 = bitcast i32 %dpp to float
				%x.f32 = bitcast i32 %x to float
				%y.f32 = bitcast i32 %y to float

				%mult = fmul float %dpp.f32, %y.f32
				%res = fadd float %mult, %x.f32
				store float %res, float addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_sequence:
				define amdgpu_kernel void @dpp_combine_sequence(i32 addrspace(1)* %out, i32 %in, i1 %cmp) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 1, i32 1, i32 1, i1 1) #0
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				; CHECK: v_add_u32_dpp {{v[0-9]+}}, vcc, {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				%resadd = add nsw i32 %dpp, %x
				br label %bb3
				bb2:
				; CHECK: v_subrev_u32_dpp {{v[0-9]+}}, vcc, {{v[0-9]+}}, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				%ressub = sub nsw i32 %x, %dpp
				br label %bb3
				bb3:
				%res = phi i32 [%resadd, %bb1], [%ressub, %bb2]
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				; CHECK-LABEL: {{^}}dpp_combine_sequence_negative:
				; CHECK: v_mov_b32_dpp v1, v1 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
				define amdgpu_kernel void @dpp_combine_sequence_negative(i32 addrspace(1)* %out, i32 %in, i1 %cmp) {
				%x = tail call i32 @llvm.amdgcn.workitem.id.x()
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in, i32 1, i32 1, i32 1, i1 1) #0
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				%resadd = add nsw i32 %dpp, %x
				br label %bb3
				bb2:
				%ressub = sub nsw i32 2, %dpp ; break seq
				br label %bb3
				bb3:
				%res = phi i32 [%resadd, %bb1], [%ressub, %bb2]
				store i32 %res, i32 addrspace(1)* %out
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x()
				declare i32 @llvm.amdgcn.workitem.id.y()
				declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1) #0

				attributes #0 = { nounwind readnone convergent }

test/CodeGen/AMDGPU/dpp_combine_subregs.mir

This file was added.

				# RUN: llc -march=amdgcn -mcpu=tonga -run-pass=gcn-dpp-combine -o - %s \| FileCheck %s

				# test if $old definition is correctly tracked through subreg manipulation pseudos

				---
				# CHECK-LABEL: name: mul_old_subreg
				# CHECK: %7:vgpr_32 = V_MUL_I32_I24_dpp %0.sub1, %1, %0.sub1, 1, 1, 1, 0, implicit $exec

				name: mul_old_subreg
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vreg_64 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: vgpr_32 }
				- { id: 4, class: vreg_64 }
				- { id: 5, class: vreg_64 }
				- { id: 6, class: vgpr_32 }
				- { id: 7, class: vgpr_32 }

				liveins:
				- { reg: '$vgpr0', virtual-reg: '%0' }
				- { reg: '$vgpr1', virtual-reg: '%1' }
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				%0:vreg_64 = COPY $vgpr0
				%1:vgpr_32 = COPY $vgpr1
				%2:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
				%3:vgpr_32 = V_MOV_B32_e32 42, implicit $exec
				%4 = REG_SEQUENCE %2, %subreg.sub0, %3, %subreg.sub1
				%5 = INSERT_SUBREG %4, %1, %subreg.sub1 ; %5.sub0 is taken from %4
				%6:vgpr_32 = V_MOV_B32_dpp %5.sub0, %1, 1, 1, 1, 0, implicit $exec
				%7:vgpr_32 = V_MUL_I32_I24_e32 %6, %0.sub1, implicit $exec
				...

				# CHECK-LABEL: name: add_old_subreg
				# CHECK: [[OLD:\%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				# CHECK: %5:vgpr_32 = V_ADD_U32_dpp [[OLD]], %1, %0.sub1, 1, 1, 1, 1, implicit $exec

				name: add_old_subreg
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vreg_64 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: vreg_64 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: vgpr_32 }

				liveins:
				- { reg: '$vgpr0', virtual-reg: '%0' }
				- { reg: '$vgpr1', virtual-reg: '%1' }
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				%0:vreg_64 = COPY $vgpr0
				%1:vgpr_32 = COPY $vgpr1
				%2:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				%3:vreg_64 = INSERT_SUBREG %0, %2, %subreg.sub1 ; %3.sub1 is inserted
				%4:vgpr_32 = V_MOV_B32_dpp %3.sub1, %1, 1, 1, 1, 0, implicit $exec
				%5:vgpr_32 = V_ADD_U32_e32 %4, %0.sub1, implicit $exec
				...

				# CHECK-LABEL: name: add_old_subreg_undef
				# CHECK: %5:vgpr_32 = V_ADD_U32_dpp %3.sub1, %1, %0.sub1, 1, 1, 1, 0, implicit $exec

				name: add_old_subreg_undef
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vreg_64 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: vreg_64 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: vgpr_32 }

				liveins:
				- { reg: '$vgpr0', virtual-reg: '%0' }
				- { reg: '$vgpr1', virtual-reg: '%1' }
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				%0:vreg_64 = COPY $vgpr0
				%1:vgpr_32 = COPY $vgpr1
				%2:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				%3:vreg_64 = REG_SEQUENCE %2, %subreg.sub0 ; %3.sub1 is undef
				%4:vgpr_32 = V_MOV_B32_dpp %3.sub1, %1, 1, 1, 1, 0, implicit $exec
				%5:vgpr_32 = V_ADD_U32_e32 %4, %0.sub1, implicit $exec
				...

				# CHECK-LABEL: name: add_f32_e64
				# CHECK: %3:vgpr_32 = V_MOV_B32_dpp undef %2, %1, 1, 1, 1, 1, implicit $exec
				# CHECK: %4:vgpr_32 = V_ADD_F32_e64 0, %3, 0, %0, 0, 1, implicit $exec
				# CHECK: %6:vgpr_32 = V_ADD_F32_dpp %2, 0, %1, 0, %0, 1, 1, 1, 1, implicit $exec
				# CHECK: %7:vgpr_32 = V_ADD_F32_dpp %2, 1, %1, 2, %0, 1, 1, 1, 1, implicit $exec

				name: add_f32_e64
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vgpr_32 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: vgpr_32 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: vgpr_32 }
				- { id: 6, class: vgpr_32 }
				- { id: 7, class: vgpr_32 }

				liveins:
				- { reg: '$vgpr0', virtual-reg: '%0' }
				- { reg: '$vgpr1', virtual-reg: '%1' }
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				%0:vgpr_32 = COPY $vgpr0
				%1:vgpr_32 = COPY $vgpr1
				%2:vgpr_32 = IMPLICIT_DEF
				%3:vgpr_32 = V_MOV_B32_dpp undef %2, %1, 1, 1, 1, 1, implicit $exec

				; this shouldn't be combined as omod is set
				%4:vgpr_32 = V_ADD_F32_e64 0, %3, 0, %0, 0, 1, implicit $exec

				%5:vgpr_32 = V_MOV_B32_dpp undef %2, %1, 1, 1, 1, 1, implicit $exec

				; this should be combined as all modifiers are default
				%6:vgpr_32 = V_ADD_F32_e64 0, %5, 0, %0, 0, 0, implicit $exec

				; this should be combined as modifiers other than abs\|neg are default
				%7:vgpr_32 = V_ADD_F32_e64 1, %5, 2, %0, 0, 0, implicit $exec
				...

test/MC/AMDGPU/vop_dpp.s

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	// VI9: v_add_f32_dpp v0, \|v0\|, -v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x00,0x00,0x02,0x00,0x01,0x69,0xa1]			// VI9: v_add_f32_dpp v0, \|v0\|, -v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x00,0x00,0x02,0x00,0x01,0x69,0xa1]
	v_add_f32 v0, \|v0\|, -v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0			v_add_f32 v0, \|v0\|, -v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Check VOP1 opcodes			// Check VOP1 opcodes
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// NOSICI: error:			// NOSICI: error:
	// VI9: v_nop row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x00,0x00,0x7e,0x00,0x01,0x09,0xa1]
	v_nop row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0			v_nop row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0

	// NOSICI: error:			// NOSICI: error:
	// VI9: v_cvt_u32_f32_dpp v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x0e,0x00,0x7e,0x00,0x01,0x09,0xa1]			// VI9: v_cvt_u32_f32_dpp v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x0e,0x00,0x7e,0x00,0x01,0x09,0xa1]
	v_cvt_u32_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0			v_cvt_u32_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0

	// NOSICI: error:			// NOSICI: error:
	// VI9: v_fract_f32_dpp v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x36,0x00,0x7e,0x00,0x01,0x09,0xa1]			// VI9: v_fract_f32_dpp v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x36,0x00,0x7e,0x00,0x01,0x09,0xa1]
	▲ Show 20 Lines • Show All 504 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Combine DPP mov with use instuctions (VOP1/2/3)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 172353

lib/Target/AMDGPU/AMDGPU.h

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/GCNDPPCombine.cpp

lib/Target/AMDGPU/SIInstrInfo.h

lib/Target/AMDGPU/SIInstrInfo.cpp

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/VOP1Instructions.td

lib/Target/AMDGPU/VOP2Instructions.td

lib/Target/AMDGPU/VOPInstructions.td

test/CodeGen/AMDGPU/dpp_combine.ll

test/CodeGen/AMDGPU/dpp_combine_subregs.mir

test/MC/AMDGPU/vop_dpp.s

AMDGPU: Combine DPP mov with use instuctions (VOP1/2/3)
ClosedPublic