This is an archive of the discontinued LLVM Phabricator instance.

Differential D13586

AMDGPU: shrink the size of BUFFER_LOAD_FORMAT_XYZW when possible
AbandonedPublic

Authored by nhaehnle on Oct 9 2015, 4:26 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD

Summary

This reduces register pressure and helps avoid some unnecessary waits
as well.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 36937.Oct 9 2015, 4:26 AM

nhaehnle retitled this revision from to AMDGPU: shrink the size of BUFFER_LOAD_FORMAT_XYZW when possible.

nhaehnle updated this object.

nhaehnle added a reviewer: • tstellarAMD.

Herald added a subscriber: arsenm. · View Herald TranscriptOct 9 2015, 4:26 AM

You also need to add lit tests for this. Take a look in test/CodeGen/AMDGPU for examples.

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
326–327	Coding style braces go on the same line as function signature.
lib/Target/AMDGPU/SIISelLowering.cpp
2098–2099	Coding style.
2128–2131	Coding style for the brace, and also function arguments should go on the same line. You should reivew. http://llvm.org/docs/CodingStandards.html You can also run clang-format on the file to get the correct style, but I'm sure there are other errors in the file, so this will probably update more than just your code.
lib/Target/AMDGPU/SIInstrInfo.td
2787–2788	You only need to create the mapping for the pseudo instructions, so I think this means you can drop isPseudo and Subtarget from the list.
lib/Target/AMDGPU/SIInstructions.td
921	Interesting, I did not now it was possible to inherit from multiclasses and classes, but as mentioned above, I think the pseudo instructions should inherit from ResizableLoad instead.

I think this should be done on the IR intrinsics in instcombine. Is there some advantage to doing it in the DAG? We don't produce formatted loads for any other reason

Matt, can you give me some hints as to how to go about this? Keep in mind that things are icky when reducing to a single f32, because we need to bypass the EXTRACT_SUBREG.

Do you think adjustWritemask (for texture sampling) should be moved analogously? There may be a reason why that is done where it's done.

lib/Target/AMDGPU/SIISelLowering.cpp
2128–2131	Heh. I actually read those standards, and produced what I feel is the most readable code following the rules spelled out in there (e.g., the braces thing is not actually in there). Function arguments on the same line is difficult and counter-productive for readability with the 80 columns limit (and also not in that document), etc. Anyway, thanks for telling me about clang-format (yet another thing not mentioned in that document...) - I'll run that later and see what it has to say.
lib/Target/AMDGPU/SIInstrInfo.td
2787–2788	True, I only need them for the pseudo instructions, but how can I tell TableGen about that? The issue is that the _si and _vi variants are generated in the inner-most defm, and I attach the ResizableLoad class outside, so the _si and _vi variants are added to the InstrMapping table even though I don't need them, and dropping isPseudo and Subtarget leads to a legitimate error about multiple matches in the same mapping table row. The thing is, it _would_ be nicer to add only the pseudos to the mapping table, but I ultimately gave up on that because I don't know how to add the ResizableLoad class only to the pseudo instructions that are loads. The defm at which the split into pseudo, si and vi happens is used by all MUBUF instructions, including stores and atomics. If you have a cleaner way to do it, I'd be delighted, but at some point I just gave up fighting TableGen and went with what you see here.
lib/Target/AMDGPU/SIInstructions.td
921	See the above. I see no way of doing that other than duplicating MUBUF_m into MUBUF_m_load, MUBUF_m_store, and MUBUF_m_atomic (or something like that). And even then, there's still the issue of what to do about the other BUFFER_LOAD_ instruction that also go via MUBUF_Load_Helper.

Hmm.... conceptually, the issue with TableGen is that we have multiple dimensions in which instructions are created (pseudo/si/vi, offen/idxen/both, different types of load/store/atomic), and we want to filter out the rows in the InstrMapping based on _two_ (or more?) of those dimensions, which is something that InstrMapping does not support. So perhaps the correct way is to extend what InstrMapping can do - perhaps allow filtering rows based on values instead of just based on an attached class.

What do you think?

I still need to discuss with Matt the best way to implement this.

lib/Target/AMDGPU/SIISelLowering.cpp
2128–2131	Ok, you are right, those things aren't in the document, I think clang-format may be the definitive coding style now. Let's just keep things consistent with the rest of the file, which means braces on the same line and when lines are too long arguments, argument on the next line are aligned with arguments on the first line.
lib/Target/AMDGPU/SIInstructions.td
921	I think it will work if you have MUBUF_Psuedo class inherit from ResizableLoad.

In D13586#263967, @nhaehnle wrote:

Matt, can you give me some hints as to how to go about this? Keep in mind that things are icky when reducing to a single f32, because we need to bypass the EXTRACT_SUBREG.

That's why I think doing it in instcombine is easier. The set of possible users is shrunk, so you don't need to worry about as many of the variants of copy. You just need to see if you only have constant extractelement and shufflevector users

In D13586#263706, @arsenm wrote:

I think this should be done on the IR intrinsics in instcombine. Is there some advantage to doing it in the DAG? We don't produce formatted loads for any other reason

Do you mean LLVM instcombine or a new AMDGPU specific instcombine?

In D13586#264121, @tstellarAMD wrote:

In D13586#263706, @arsenm wrote:

I think this should be done on the IR intrinsics in instcombine. Is there some advantage to doing it in the DAG? We don't produce formatted loads for any other reason

Do you mean LLVM instcombine or a new AMDGPU specific instcombine?

It can be done in LLVM instcombine. The intrinsics would have to be moved to include/llvm/IR/IntrinsicsAMDGPU.td (and ideally changed to use the amdgcn prefix and clean any other mistakes in how these are currently defined). There are already a few intrinsics handled there.

In D13586#264122, @arsenm wrote:

In D13586#264121, @tstellarAMD wrote:

In D13586#263706, @arsenm wrote:

I think this should be done on the IR intrinsics in instcombine. Is there some advantage to doing it in the DAG? We don't produce formatted loads for any other reason

Do you mean LLVM instcombine or a new AMDGPU specific instcombine?

It can be done in LLVM instcombine. The intrinsics would have to be moved to include/llvm/IR/IntrinsicsAMDGPU.td (and ideally changed to use the amdgcn prefix and clean any other mistakes in how these are currently defined). There are already a few intrinsics handled there.

I thought about it some more and I think it's probably a bit much for instcombine. It should probably go in a new AMDGPU CodeGenPrepare like IR pass

Just to clarify, adjustWritemask (for MIMG) should then be similarly moved, right?

nhaehnle added inline comments.Oct 12 2015, 6:55 AM

lib/Target/AMDGPU/SIInstructions.td
921	Sorry, missed that. That would cause buffer_store and _atomic instructions to also inherit from ResizableLoad, if I'm reading the code right.

nhaehnle abandoned this revision.Feb 21 2018, 6:55 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptFeb 21 2018, 6:55 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

2 lines

13 lines

3 lines

97 lines

13 lines

8 lines

Diff 36937

lib/Target/AMDGPU/AMDGPUInstrInfo.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	//===---------------------------------------------------------------------===//
virtual MachineInstr buildMovInstr(MachineBasicBlock MBB,		virtual MachineInstr buildMovInstr(MachineBasicBlock MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned DstReg, unsigned SrcReg) const = 0;		unsigned DstReg, unsigned SrcReg) const = 0;

/// \brief Given a MIMG \p Opcode that writes all 4 channels, return the		/// \brief Given a MIMG \p Opcode that writes all 4 channels, return the
/// equivalent opcode that writes \p Channels Channels.		/// equivalent opcode that writes \p Channels Channels.
int getMaskedMIMGOp(uint16_t Opcode, unsigned Channels) const;		int getMaskedMIMGOp(uint16_t Opcode, unsigned Channels) const;

		/// Return the equivalent opcode that loads \p Size dwords.
		int getResizedLoadOp(uint16_t Opcode, unsigned Size) const;
};		};

namespace AMDGPU {		namespace AMDGPU {
LLVM_READONLY		LLVM_READONLY
int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIndex);		int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIndex);
} // End namespace AMDGPU		} // End namespace AMDGPU

} // End llvm namespace		} // End llvm namespace

#define AMDGPU_FLAG_REGISTER_LOAD (UINT64_C(1) << 63)		#define AMDGPU_FLAG_REGISTER_LOAD (UINT64_C(1) << 63)
#define AMDGPU_FLAG_REGISTER_STORE (UINT64_C(1) << 62)		#define AMDGPU_FLAG_REGISTER_STORE (UINT64_C(1) << 62)

#endif		#endif

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	int AMDGPUInstrInfo::getMaskedMIMGOp(uint16_t Opcode, unsigned Channels) const {
switch (Channels) {		switch (Channels) {
default: return Opcode;		default: return Opcode;
case 1: return AMDGPU::getMaskedMIMGOp(Opcode, AMDGPU::Channels_1);		case 1: return AMDGPU::getMaskedMIMGOp(Opcode, AMDGPU::Channels_1);
case 2: return AMDGPU::getMaskedMIMGOp(Opcode, AMDGPU::Channels_2);		case 2: return AMDGPU::getMaskedMIMGOp(Opcode, AMDGPU::Channels_2);
case 3: return AMDGPU::getMaskedMIMGOp(Opcode, AMDGPU::Channels_3);		case 3: return AMDGPU::getMaskedMIMGOp(Opcode, AMDGPU::Channels_3);
}		}
}		}

		int AMDGPUInstrInfo::getResizedLoadOp(uint16_t Opcode, unsigned int Size) const
		{
		tstellarAMDUnsubmitted Not Done Reply Inline Actions Coding style braces go on the same line as function signature. tstellarAMD: Coding style braces go on the same line as function signature.
		AMDGPU::Size InSize;
		switch (Size) {
		case 1: InSize = AMDGPU::Size_1; break;
		case 2: InSize = AMDGPU::Size_2; break;
		case 3: InSize = AMDGPU::Size_3; break;
		case 4: InSize = AMDGPU::Size_4; break;
		default: return -1;
		}
		return AMDGPU::getResizedLoadOp(Opcode, InSize);
		}

// Wrapper for Tablegen'd function. enum Subtarget is not defined in any		// Wrapper for Tablegen'd function. enum Subtarget is not defined in any
// header files, so we need to wrap it in a function that takes unsigned		// header files, so we need to wrap it in a function that takes unsigned
// instead.		// instead.
namespace llvm {		namespace llvm {
namespace AMDGPU {		namespace AMDGPU {
static int getMCOpcode(uint16_t Opcode, unsigned Gen) {		static int getMCOpcode(uint16_t Opcode, unsigned Gen) {
return getMCOpcodeGen(Opcode, (enum Subtarget)Gen);		return getMCOpcodeGen(Opcode, (enum Subtarget)Gen);
}		}
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show All 37 Lines	class SITargetLowering : public AMDGPUTargetLowering {
SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;

void adjustWritemask(MachineSDNode *&N, SelectionDAG &DAG) const;		void adjustWritemask(MachineSDNode *&N, SelectionDAG &DAG) const;
		void adjustLoadSize(MachineSDNode *&N, SelectionDAG &DAG) const;

		LaneBitmask findUsedLanes(SDNode *N) const;

SDValue performUCharToFloatCombine(SDNode *N,		SDValue performUCharToFloatCombine(SDNode *N,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;
SDValue performSHLPtrCombine(SDNode *N,		SDValue performSHLPtrCombine(SDNode *N,
unsigned AS,		unsigned AS,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;
SDValue performAndCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performAndCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performOrCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performOrCombine(SDNode *N, DAGCombinerInfo &DCI) const;
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show All 27 Lines
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"

		#define DEBUG_TYPE "si-isel-lowering"

using namespace llvm;		using namespace llvm;

SITargetLowering::SITargetLowering(TargetMachine &TM,		SITargetLowering::SITargetLowering(TargetMachine &TM,
const AMDGPUSubtarget &STI)		const AMDGPUSubtarget &STI)
: AMDGPUTargetLowering(TM, STI) {		: AMDGPUTargetLowering(TM, STI) {
addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);		addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);
addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);

▲ Show 20 Lines • Show All 2,041 Lines • ▼ Show 20 Lines	for (unsigned i = 0, Idx = AMDGPU::sub0; i < 4; ++i) {
default: break;		default: break;
case AMDGPU::sub0: Idx = AMDGPU::sub1; break;		case AMDGPU::sub0: Idx = AMDGPU::sub1; break;
case AMDGPU::sub1: Idx = AMDGPU::sub2; break;		case AMDGPU::sub1: Idx = AMDGPU::sub2; break;
case AMDGPU::sub2: Idx = AMDGPU::sub3; break;		case AMDGPU::sub2: Idx = AMDGPU::sub3; break;
}		}
}		}
}		}

		/// Returns bitmask of the lanes of the vector result of \p N that are actually
		/// used. Early out is automatically taken when the highest lane is used; ~0 is
		/// returned in this case.
		LaneBitmask SITargetLowering::findUsedLanes(SDNode *N) const
		{
		tstellarAMDUnsubmitted Not Done Reply Inline Actions Coding style. tstellarAMD: Coding style.
		assert(N->getNumValues() == 1);
		const unsigned NumElements = N->getValueType(0).getVectorNumElements();
		const LaneBitmask HighestMask = (LaneBitmask)1 << (NumElements - 1);
		LaneBitmask Mask = 0;

		for (SDNode *use : N->uses()) {
		if (!use->isMachineOpcode()) {
		DEBUG(dbgs() << "findUsedLanes: non-machine opcode\n");
		return ~(LaneBitmask)0;
		}

		if (use->getMachineOpcode() != TargetOpcode::EXTRACT_SUBREG) {
		DEBUG(dbgs() << "findUsedLanes: unsupported opcode "
		<< use->getMachineOpcode());
		return ~(LaneBitmask)0;
		}

		Mask \|= Subtarget->getRegisterInfo()->getSubRegIndexLaneMask(
		use->getConstantOperandVal(1));
		if (Mask & HighestMask)
		return ~(LaneBitmask)0;
		}

		return Mask;
		}

		/// Reduce size of a load instruction if only a prefix of returned channels
		/// are used. Currently only used for BUFFER_LOAD_FORMAT_XYZW.
		void SITargetLowering::adjustLoadSize(
		MachineSDNode *&N,
		SelectionDAG &DAG) const
		{
		tstellarAMDUnsubmitted Not Done Reply Inline Actions Coding style for the brace, and also function arguments should go on the same line. You should reivew. http://llvm.org/docs/CodingStandards.html You can also run clang-format on the file to get the correct style, but I'm sure there are other errors in the file, so this will probably update more than just your code. tstellarAMD: Coding style for the brace, and also function arguments should go on the same line. You should…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Heh. I actually read those standards, and produced what I feel is the most readable code following the rules spelled out in there (e.g., the braces thing is not actually in there). Function arguments on the same line is difficult and counter-productive for readability with the 80 columns limit (and also not in that document), etc. Anyway, thanks for telling me about clang-format (yet another thing not mentioned in that document...) - I'll run that later and see what it has to say. nhaehnle: Heh. I actually read those standards, and produced what I feel is the most readable code…
		tstellarAMDUnsubmitted Not Done Reply Inline Actions Ok, you are right, those things aren't in the document, I think clang-format may be the definitive coding style now. Let's just keep things consistent with the rest of the file, which means braces on the same line and when lines are too long arguments, argument on the next line are aligned with arguments on the first line. tstellarAMD: Ok, you are right, those things aren't in the document, I think clang-format may be the…
		EVT OriginalType = N->getValueType(0);

		if (!OriginalType.isVector())
		return;

		const LaneBitmask Mask = findUsedLanes(N);
		if (Mask == ~(LaneBitmask)0)
		return;
		if (!Mask) {
		DEBUG(dbgs() << "adjustLoadSize: dead load has not been eliminated\n");
		return;
		}

		const unsigned RequiredSize = findLastSet(Mask) + 1;

		const SIInstrInfo *TII =
		static_cast<const SIInstrInfo *>(Subtarget->getInstrInfo());
		const unsigned NewOpcode = TII->getResizedLoadOp(
		N->getMachineOpcode(), RequiredSize);

		// Make a temporary copy of operands to avoid problems with in-place mutation.
		std::vector<SDValue> Ops;
		Ops.insert(Ops.end(), N->op_begin(), N->op_end());

		if (RequiredSize > 1) {
		// We do not adjust the type of the node here, because MachineValueTypes
		// do not support v3f32 types properly. The correct machine register class
		// will eventually be selected based on the opcode after the
		// MachineInstruction is built.
		N = static_cast<MachineSDNode *>(DAG.SelectNodeTo(
		N, NewOpcode, OriginalType, Ops));
		} else {
		// Bypass EXTRACT_SUBREG instructions here, because trying to deal with
		// v1xx types is a headache.
		const EVT NewType = OriginalType.getVectorElementType();
		N = static_cast<MachineSDNode *>(DAG.SelectNodeTo(
		N, NewOpcode, NewType, Ops));

		for (SDNode *use : N->uses()) {
		assert(use->getMachineOpcode() == TargetOpcode::EXTRACT_SUBREG);

		DAG.ReplaceAllUsesWith(use, N);
		}
		}
		}

static bool isFrameIndexOp(SDValue Op) {		static bool isFrameIndexOp(SDValue Op) {
if (Op.getOpcode() == ISD::AssertZext)		if (Op.getOpcode() == ISD::AssertZext)
Op = Op.getOperand(0);		Op = Op.getOperand(0);

return isa<FrameIndexSDNode>(Op);		return isa<FrameIndexSDNode>(Op);
}		}

/// \brief Legalize target independent instructions (e.g. INSERT_SUBREG)		/// \brief Legalize target independent instructions (e.g. INSERT_SUBREG)
Show All 19 Lines
}		}

/// \brief Fold the instructions after selecting them.		/// \brief Fold the instructions after selecting them.
SDNode SITargetLowering::PostISelFolding(MachineSDNode Node,		SDNode SITargetLowering::PostISelFolding(MachineSDNode Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const SIInstrInfo *TII =		const SIInstrInfo *TII =
static_cast<const SIInstrInfo *>(Subtarget->getInstrInfo());		static_cast<const SIInstrInfo *>(Subtarget->getInstrInfo());

if (TII->isMIMG(Node->getMachineOpcode()))		const unsigned opcode = Node->getMachineOpcode();
adjustWritemask(Node, DAG);

if (Node->getMachineOpcode() == AMDGPU::INSERT_SUBREG \|\|		if (TII->isMIMG(opcode)) {
Node->getMachineOpcode() == AMDGPU::REG_SEQUENCE) {		adjustWritemask(Node, DAG);
		} else if (TII->isMUBUF(opcode)) {
		if (TII->getResizedLoadOp(opcode, 1) >= 0)
		adjustLoadSize(Node, DAG);
		} else if (opcode == AMDGPU::INSERT_SUBREG \|\|
		opcode == AMDGPU::REG_SEQUENCE) {
legalizeTargetIndependentNode(Node, DAG);		legalizeTargetIndependentNode(Node, DAG);
return Node;		return Node;
}		}
return Node;		return Node;
}		}

/// \brief Assign the register class depending on the number of		/// \brief Assign the register class depending on the number of
/// bits set in the writemask		/// bits set in the writemask
▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	// Execpt for the NONE field, this must be kept in sync with the SISubtarget enum			// Execpt for the NONE field, this must be kept in sync with the SISubtarget enum
	// in AMDGPUInstrInfo.cpp			// in AMDGPUInstrInfo.cpp
	def SISubtarget {			def SISubtarget {
	int NONE = -1;			int NONE = -1;
	int SI = 0;			int SI = 0;
	int VI = 1;			int VI = 1;
	}			}

				class ResizableLoad<string group, int size> {
				string Group = group;
				int Size = size;
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SI DAG Nodes			// SI DAG Nodes
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def SIload_constant : SDNode<"AMDGPUISD::LOAD_CONSTANT",			def SIload_constant : SDNode<"AMDGPUISD::LOAD_CONSTANT",
	SDTypeProfile<1, 2, [SDTCisVT<0, f32>, SDTCisVT<1, v4i32>, SDTCisVT<2, i32>]>,			SDTypeProfile<1, 2, [SDTCisVT<0, f32>, SDTCisVT<1, v4i32>, SDTCisVT<2, i32>]>,
	[SDNPMayLoad, SDNPMemOperand]			[SDNPMayLoad, SDNPMemOperand]
	>;			>;
	▲ Show 20 Lines • Show All 2,673 Lines • ▼ Show 20 Lines
	def getMaskedMIMGOp : InstrMapping {			def getMaskedMIMGOp : InstrMapping {
	let FilterClass = "MIMG_Mask";			let FilterClass = "MIMG_Mask";
	let RowFields = ["Op"];			let RowFields = ["Op"];
	let ColFields = ["Channels"];			let ColFields = ["Channels"];
	let KeyCol = ["4"];			let KeyCol = ["4"];
	let ValueCols = [["1"], ["2"], ["3"] ];			let ValueCols = [["1"], ["2"], ["3"] ];
	}			}

				def getResizedLoadOp : InstrMapping {
				let FilterClass = "ResizableLoad";
				let RowFields = ["Group", "idxen", "offen", "vaddr", "isPseudo", "Subtarget"];
				tstellarAMDUnsubmitted Not Done Reply Inline Actions You only need to create the mapping for the pseudo instructions, so I think this means you can drop isPseudo and Subtarget from the list. tstellarAMD: You only need to create the mapping for the pseudo instructions, so I think this means you can…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions True, I only need them for the pseudo instructions, but how can I tell TableGen about that? The issue is that the _si and _vi variants are generated in the inner-most defm, and I attach the ResizableLoad class outside, so the _si and _vi variants are added to the InstrMapping table even though I don't need them, and dropping isPseudo and Subtarget leads to a legitimate error about multiple matches in the same mapping table row. The thing is, it _would_ be nicer to add only the pseudos to the mapping table, but I ultimately gave up on that because I don't know how to add the ResizableLoad class only to the pseudo instructions that are loads. The defm at which the split into pseudo, si and vi happens is used by all MUBUF instructions, including stores and atomics. If you have a cleaner way to do it, I'd be delighted, but at some point I just gave up fighting TableGen and went with what you see here. nhaehnle: True, I only need them for the pseudo instructions, but how can I tell TableGen about that? The…
				let ColFields = ["Size"];
				let KeyCol = ["4"];
				let ValueCols = [["1"], ["2"], ["3"], ["4"]];
				}

	// Maps an commuted opcode to its original version			// Maps an commuted opcode to its original version
	def getCommuteOrig : InstrMapping {			def getCommuteOrig : InstrMapping {
	let FilterClass = "VOP2_REV";			let FilterClass = "VOP2_REV";
	let RowFields = ["RevOp"];			let RowFields = ["RevOp"];
	let ColFields = ["IsOrig"];			let ColFields = ["IsOrig"];
	let KeyCol = ["0"];			let KeyCol = ["0"];
	let ValueCols = [["1"]];			let ValueCols = [["1"]];
	}			}
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

	Show First 20 Lines • Show All 912 Lines • ▼ Show 20 Lines
	defm DS_MAX_SRC2_F64 : DS_1A <0xd3, "ds_max_src2_f64">;			defm DS_MAX_SRC2_F64 : DS_1A <0xd3, "ds_max_src2_f64">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// MUBUF Instructions			// MUBUF Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	defm BUFFER_LOAD_FORMAT_X : MUBUF_Load_Helper <			defm BUFFER_LOAD_FORMAT_X : MUBUF_Load_Helper <
	mubuf<0x00>, "buffer_load_format_x", VGPR_32			mubuf<0x00>, "buffer_load_format_x", VGPR_32
	>;			>, ResizableLoad <"buffer_load_format", 1>;
				tstellarAMDUnsubmitted Not Done Reply Inline Actions Interesting, I did not now it was possible to inherit from multiclasses and classes, but as mentioned above, I think the pseudo instructions should inherit from ResizableLoad instead. tstellarAMD: Interesting, I did not now it was possible to inherit from multiclasses and classes, but as…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions See the above. I see no way of doing that other than duplicating MUBUF_m into MUBUF_m_load, MUBUF_m_store, and MUBUF_m_atomic (or something like that). And even then, there's still the issue of what to do about the other BUFFER_LOAD_ instruction that also go via MUBUF_Load_Helper. nhaehnle: See the above. I see no way of doing that other than duplicating MUBUF_m into MUBUF_m_load…
				tstellarAMDUnsubmitted Not Done Reply Inline Actions I think it will work if you have MUBUF_Psuedo class inherit from ResizableLoad. tstellarAMD: I think it will work if you have MUBUF_Psuedo class inherit from ResizableLoad.
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Sorry, missed that. That would cause buffer_store and _atomic instructions to also inherit from ResizableLoad, if I'm reading the code right. nhaehnle: Sorry, missed that. That would cause buffer_store and _atomic instructions to also inherit…
	defm BUFFER_LOAD_FORMAT_XY : MUBUF_Load_Helper <			defm BUFFER_LOAD_FORMAT_XY : MUBUF_Load_Helper <
	mubuf<0x01>, "buffer_load_format_xy", VReg_64			mubuf<0x01>, "buffer_load_format_xy", VReg_64
	>;			>, ResizableLoad <"buffer_load_format", 2>;
	defm BUFFER_LOAD_FORMAT_XYZ : MUBUF_Load_Helper <			defm BUFFER_LOAD_FORMAT_XYZ : MUBUF_Load_Helper <
	mubuf<0x02>, "buffer_load_format_xyz", VReg_96			mubuf<0x02>, "buffer_load_format_xyz", VReg_96
	>;			>, ResizableLoad <"buffer_load_format", 3>;
	defm BUFFER_LOAD_FORMAT_XYZW : MUBUF_Load_Helper <			defm BUFFER_LOAD_FORMAT_XYZW : MUBUF_Load_Helper <
	mubuf<0x03>, "buffer_load_format_xyzw", VReg_128			mubuf<0x03>, "buffer_load_format_xyzw", VReg_128
	>;			>, ResizableLoad <"buffer_load_format", 4>;
	defm BUFFER_STORE_FORMAT_X : MUBUF_Store_Helper <			defm BUFFER_STORE_FORMAT_X : MUBUF_Store_Helper <
	mubuf<0x04>, "buffer_store_format_x", VGPR_32			mubuf<0x04>, "buffer_store_format_x", VGPR_32
	>;			>;
	defm BUFFER_STORE_FORMAT_XY : MUBUF_Store_Helper <			defm BUFFER_STORE_FORMAT_XY : MUBUF_Store_Helper <
	mubuf<0x05>, "buffer_store_format_xy", VReg_64			mubuf<0x05>, "buffer_store_format_xy", VReg_64
	>;			>;
	defm BUFFER_STORE_FORMAT_XYZ : MUBUF_Store_Helper <			defm BUFFER_STORE_FORMAT_XYZ : MUBUF_Store_Helper <
	mubuf<0x06>, "buffer_store_format_xyz", VReg_96			mubuf<0x06>, "buffer_store_format_xyz", VReg_96
	▲ Show 20 Lines • Show All 2,348 Lines • Show Last 20 Lines