This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Turn D16 for MIMG instructions into a regular operand
ClosedPublic

Authored by nhaehnle on May 27 2018, 1:55 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
kzhuravl
dp
rtaylor
dstuttard
artem.tamazov

Commits

rGf26743190181: AMDGPU: Turn D16 for MIMG instructions into a regular operand
rL335222: AMDGPU: Turn D16 for MIMG instructions into a regular operand

Summary

This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.

We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.

We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.

The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.

As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).

While we're at it, delete a whole bunch of dead legacy TableGen code.

Change-Id: I89b02c2841c06f95e662541433e597f5d4553978

Diff Detail

Repository

rL LLVM

Build Status

Buildable 18645
Build 18645: arc lint + arc unit

Event Timeline

nhaehnle created this revision.May 27 2018, 1:55 PM

Herald added subscribers: t-tye, tpr, dstuttard and 2 others. · View Herald TranscriptMay 27 2018, 1:55 PM

Harbormaster completed remote builds in B18645: Diff 148771.May 27 2018, 1:55 PM

nhaehnle added a parent revision: D47433: AMDGPU: Make various NamedOperands upper case.May 27 2018, 1:55 PM

nhaehnle added a reviewer: dstuttard.May 29 2018, 2:37 AM

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

In D47434#1114406, @arsenm wrote:

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

In D47434#1114811, @nhaehnle wrote:

In D47434#1114406, @arsenm wrote:

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

Actually, can we have a VGPR32_LO/HI as the vdata register class for modeling the fact that the upper bits are preserved?

Adding Dmitry as he is more fluent in this domain.

In D47434#1114816, @nhaehnle wrote:

In D47434#1114811, @nhaehnle wrote:

In D47434#1114406, @arsenm wrote:

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

Actually, can we have a VGPR32_LO/HI as the vdata register class for modeling the fact that the upper bits are preserved?

I'm not sure. I've only looked at this a little bit before but it certainly needs experimentation. My guess is the tied operand will be necessary

In D47434#1116075, @arsenm wrote:

In D47434#1114816, @nhaehnle wrote:

In D47434#1114811, @nhaehnle wrote:

In D47434#1114406, @arsenm wrote:

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

Actually, can we have a VGPR32_LO/HI as the vdata register class for modeling the fact that the upper bits are preserved?

I'm not sure. I've only looked at this a little bit before but it certainly needs experimentation. My guess is the tied operand will be necessary

Okay. Let's assume we don't get half-sized register classes and we do need the tied operand. The question at hand is whether D16 instructions need to be separate from non-D16 instructions. I would prefer them to be not separate, hence this patch. I think it should be possible to always have tied operands, but with undef uses in the non-D16 case. What do you think?

Looks good.

Regarding partial register updates: would there be any performance benefit from supporting this feature for MIMG? I.e. cannot we just ignore this feature and handle upper bits as undefined?

This revision is now accepted and ready to land.Jun 1 2018, 5:30 AM

In D47434#1118807, @dp wrote:

Regarding partial register updates: would there be any performance benefit from supporting this feature for MIMG? I.e. cannot we just ignore this feature and handle upper bits as undefined?

I believe that's effectively what we're doing today.

nhaehnle added a child revision: D48011: AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM).Jun 11 2018, 4:52 AM

I'm assuming that I can submit this as-is soon. I'm holding off for now so that I can submit all MIMG-related changes in this stack at once.

Closed by commit rL335222: AMDGPU: Turn D16 for MIMG instructions into a regular operand (authored by nha). · Explain WhyJun 21 2018, 6:40 AM

This revision was automatically updated to reflect the committed changes.

ruiling mentioned this in D140537: AMDGPU/SIInsertWait: Skip dummy tied source.Dec 22 2022, 4:24 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUInstrInfo.cpp

3 lines

AsmParser/

AMDGPUAsmParser.cpp

31 lines

BUFInstructions.td

8 lines

Disassembler/

AMDGPUDisassembler.cpp

14 lines

InstPrinter/

AMDGPUInstPrinter.h

2 lines

AMDGPUInstPrinter.cpp

5 lines

575 lines

4 lines

11 lines

10 lines

8 lines

18 lines

Utils/

AMDGPUBaseInfo.h

3 lines

test/

CodeGen/

AMDGPU/

coalescer-subreg-join.mir

4 lines

MC/

AMDGPU/

mimg.s

17 lines

Diff 148771

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&
Gen = SIEncodingFamily::GFX9;		Gen = SIEncodingFamily::GFX9;

if (get(Opcode).TSFlags & SIInstrFlags::SDWA)		if (get(Opcode).TSFlags & SIInstrFlags::SDWA)
Gen = ST.getGeneration() == AMDGPUSubtarget::GFX9 ? SIEncodingFamily::SDWA9		Gen = ST.getGeneration() == AMDGPUSubtarget::GFX9 ? SIEncodingFamily::SDWA9
: SIEncodingFamily::SDWA;		: SIEncodingFamily::SDWA;
// Adjust the encoding family to GFX80 for D16 buffer instructions when the		// Adjust the encoding family to GFX80 for D16 buffer instructions when the
// subtarget has UnpackedD16VMem feature.		// subtarget has UnpackedD16VMem feature.
// TODO: remove this when we discard GFX80 encoding.		// TODO: remove this when we discard GFX80 encoding.
if (ST.hasUnpackedD16VMem() && (get(Opcode).TSFlags & SIInstrFlags::D16)		if (ST.hasUnpackedD16VMem() && (get(Opcode).TSFlags & SIInstrFlags::D16Buf))
&& !(get(Opcode).TSFlags & SIInstrFlags::MIMG))
Gen = SIEncodingFamily::GFX80;		Gen = SIEncodingFamily::GFX80;

int MCOp = AMDGPU::getMCOpcode(Opcode, Gen);		int MCOp = AMDGPU::getMCOpcode(Opcode, Gen);

// -1 means that Opcode is already a native instruction.		// -1 means that Opcode is already a native instruction.
if (MCOp == -1)		if (MCOp == -1)
return Opcode;		return Opcode;

Show All 28 Lines

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Show First 20 Lines • Show All 2,295 Lines • ▼ Show 20 Lines
bool AMDGPUAsmParser::validateMIMGDataSize(const MCInst &Inst) {		bool AMDGPUAsmParser::validateMIMGDataSize(const MCInst &Inst) {

const unsigned Opc = Inst.getOpcode();		const unsigned Opc = Inst.getOpcode();
const MCInstrDesc &Desc = MII.get(Opc);		const MCInstrDesc &Desc = MII.get(Opc);

if ((Desc.TSFlags & SIInstrFlags::MIMG) == 0)		if ((Desc.TSFlags & SIInstrFlags::MIMG) == 0)
return true;		return true;

// Gather4 instructions do not need validation: dst size is hardcoded.
if (Desc.TSFlags & SIInstrFlags::Gather4)
return true;

int VDataIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdata);		int VDataIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdata);
int DMaskIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::dmask);		int DMaskIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::dmask);
int TFEIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::tfe);		int TFEIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::tfe);

assert(VDataIdx != -1);		assert(VDataIdx != -1);
assert(DMaskIdx != -1);		assert(DMaskIdx != -1);
assert(TFEIdx != -1);		assert(TFEIdx != -1);

unsigned VDataSize = AMDGPU::getRegOperandSize(getMRI(), Desc, VDataIdx);		unsigned VDataSize = AMDGPU::getRegOperandSize(getMRI(), Desc, VDataIdx);
unsigned TFESize = Inst.getOperand(TFEIdx).getImm()? 1 : 0;		unsigned TFESize = Inst.getOperand(TFEIdx).getImm()? 1 : 0;
unsigned DMask = Inst.getOperand(DMaskIdx).getImm() & 0xf;		unsigned DMask = Inst.getOperand(DMaskIdx).getImm() & 0xf;
if (DMask == 0)		if (DMask == 0)
DMask = 1;		DMask = 1;

unsigned DataSize = countPopulation(DMask);		unsigned DataSize =
if ((Desc.TSFlags & SIInstrFlags::D16) != 0 && hasPackedD16()) {		(Desc.TSFlags & SIInstrFlags::Gather4) ? 4 : countPopulation(DMask);
		if (hasPackedD16()) {
		int D16Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::d16);
		if (D16Idx >= 0 && Inst.getOperand(D16Idx).getImm())
DataSize = (DataSize + 1) / 2;		DataSize = (DataSize + 1) / 2;
}		}

return (VDataSize / 4) == DataSize + TFESize;		return (VDataSize / 4) == DataSize + TFESize;
}		}

bool AMDGPUAsmParser::validateMIMGAtomicDMask(const MCInst &Inst) {		bool AMDGPUAsmParser::validateMIMGAtomicDMask(const MCInst &Inst) {

const unsigned Opc = Inst.getOpcode();		const unsigned Opc = Inst.getOpcode();
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

bool AMDGPUAsmParser::validateMIMGD16(const MCInst &Inst) {		bool AMDGPUAsmParser::validateMIMGD16(const MCInst &Inst) {

const unsigned Opc = Inst.getOpcode();		const unsigned Opc = Inst.getOpcode();
const MCInstrDesc &Desc = MII.get(Opc);		const MCInstrDesc &Desc = MII.get(Opc);

if ((Desc.TSFlags & SIInstrFlags::MIMG) == 0)		if ((Desc.TSFlags & SIInstrFlags::MIMG) == 0)
return true;		return true;
if ((Desc.TSFlags & SIInstrFlags::D16) == 0)
return true;

return !isCI() && !isSI();		int D16Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::d16);
		if (D16Idx >= 0 && Inst.getOperand(D16Idx).getImm()) {
		if (isCI() \|\| isSI())
		return false;
		}

		return true;
}		}

bool AMDGPUAsmParser::validateInstruction(const MCInst &Inst,		bool AMDGPUAsmParser::validateInstruction(const MCInst &Inst,
const SMLoc &IDLoc) {		const SMLoc &IDLoc) {
if (!validateConstantBusLimitations(Inst)) {		if (!validateConstantBusLimitations(Inst)) {
Error(IDLoc,		Error(IDLoc,
"invalid operand (violates constant bus restrictions)");		"invalid operand (violates constant bus restrictions)");
return false;		return false;
▲ Show 20 Lines • Show All 1,852 Lines • ▼ Show 20 Lines	void AMDGPUAsmParser::cvtMIMG(MCInst &Inst, const OperandVector &Operands,
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDMask);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDMask);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyUNorm);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyUNorm);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyR128);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyR128);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyLWE);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyLWE);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDA);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDA);
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyD16);
}		}

void AMDGPUAsmParser::cvtMIMGAtomic(MCInst &Inst, const OperandVector &Operands) {		void AMDGPUAsmParser::cvtMIMGAtomic(MCInst &Inst, const OperandVector &Operands) {
cvtMIMG(Inst, Operands, true);		cvtMIMG(Inst, Operands, true);
}		}

AMDGPUOperand::Ptr AMDGPUAsmParser::defaultDMask() const {		AMDGPUOperand::Ptr AMDGPUAsmParser::defaultDMask() const {
return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyDMask);		return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyDMask);
Show All 10 Lines
AMDGPUOperand::Ptr AMDGPUAsmParser::defaultR128() const {		AMDGPUOperand::Ptr AMDGPUAsmParser::defaultR128() const {
return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyR128);		return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyR128);
}		}

AMDGPUOperand::Ptr AMDGPUAsmParser::defaultLWE() const {		AMDGPUOperand::Ptr AMDGPUAsmParser::defaultLWE() const {
return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyLWE);		return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyLWE);
}		}

		AMDGPUOperand::Ptr AMDGPUAsmParser::defaultD16() const {
		return AMDGPUOperand::CreateImm(this, 0, SMLoc(), AMDGPUOperand::ImmTyD16);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// smrd		// smrd
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool AMDGPUOperand::isSMRDOffset8() const {		bool AMDGPUOperand::isSMRDOffset8() const {
return isImm() && isUInt<8>(getImm());		return isImm() && isUInt<8>(getImm());
}		}

▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	static const OptionalOperand AMDGPUOptionalOperandTable[] = {
{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},		{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},
{"high", AMDGPUOperand::ImmTyHigh, true, nullptr},		{"high", AMDGPUOperand::ImmTyHigh, true, nullptr},
{"clamp", AMDGPUOperand::ImmTyClampSI, true, nullptr},		{"clamp", AMDGPUOperand::ImmTyClampSI, true, nullptr},
{"omod", AMDGPUOperand::ImmTyOModSI, false, ConvertOmodMul},		{"omod", AMDGPUOperand::ImmTyOModSI, false, ConvertOmodMul},
{"unorm", AMDGPUOperand::ImmTyUNorm, true, nullptr},		{"unorm", AMDGPUOperand::ImmTyUNorm, true, nullptr},
{"da", AMDGPUOperand::ImmTyDA, true, nullptr},		{"da", AMDGPUOperand::ImmTyDA, true, nullptr},
{"r128", AMDGPUOperand::ImmTyR128, true, nullptr},		{"r128", AMDGPUOperand::ImmTyR128, true, nullptr},
{"lwe", AMDGPUOperand::ImmTyLWE, true, nullptr},		{"lwe", AMDGPUOperand::ImmTyLWE, true, nullptr},
		{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},
{"dmask", AMDGPUOperand::ImmTyDMask, false, nullptr},		{"dmask", AMDGPUOperand::ImmTyDMask, false, nullptr},
{"row_mask", AMDGPUOperand::ImmTyDppRowMask, false, nullptr},		{"row_mask", AMDGPUOperand::ImmTyDppRowMask, false, nullptr},
{"bank_mask", AMDGPUOperand::ImmTyDppBankMask, false, nullptr},		{"bank_mask", AMDGPUOperand::ImmTyDppBankMask, false, nullptr},
{"bound_ctrl", AMDGPUOperand::ImmTyDppBoundCtrl, false, ConvertBoundCtrl},		{"bound_ctrl", AMDGPUOperand::ImmTyDppBoundCtrl, false, ConvertBoundCtrl},
{"dst_sel", AMDGPUOperand::ImmTySdwaDstSel, false, nullptr},		{"dst_sel", AMDGPUOperand::ImmTySdwaDstSel, false, nullptr},
{"src0_sel", AMDGPUOperand::ImmTySdwaSrc0Sel, false, nullptr},		{"src0_sel", AMDGPUOperand::ImmTySdwaSrc0Sel, false, nullptr},
{"src1_sel", AMDGPUOperand::ImmTySdwaSrc1Sel, false, nullptr},		{"src1_sel", AMDGPUOperand::ImmTySdwaSrc1Sel, false, nullptr},
{"dst_unused", AMDGPUOperand::ImmTySdwaDstUnused, false, nullptr},		{"dst_unused", AMDGPUOperand::ImmTySdwaDstUnused, false, nullptr},
▲ Show 20 Lines • Show All 689 Lines • ▼ Show 20 Lines	unsigned AMDGPUAsmParser::validateTargetOperandClass(MCParsedAsmOperand &Op,
case MCK_addr64:		case MCK_addr64:
return Operand.isAddr64() ? Match_Success : Match_InvalidOperand;		return Operand.isAddr64() ? Match_Success : Match_InvalidOperand;
case MCK_gds:		case MCK_gds:
return Operand.isGDS() ? Match_Success : Match_InvalidOperand;		return Operand.isGDS() ? Match_Success : Match_InvalidOperand;
case MCK_lds:		case MCK_lds:
return Operand.isLDS() ? Match_Success : Match_InvalidOperand;		return Operand.isLDS() ? Match_Success : Match_InvalidOperand;
case MCK_glc:		case MCK_glc:
return Operand.isGLC() ? Match_Success : Match_InvalidOperand;		return Operand.isGLC() ? Match_Success : Match_InvalidOperand;
case MCK_d16:
return Operand.isD16() ? Match_Success : Match_InvalidOperand;
case MCK_idxen:		case MCK_idxen:
return Operand.isIdxen() ? Match_Success : Match_InvalidOperand;		return Operand.isIdxen() ? Match_Success : Match_InvalidOperand;
case MCK_offen:		case MCK_offen:
return Operand.isOffen() ? Match_Success : Match_InvalidOperand;		return Operand.isOffen() ? Match_Success : Match_InvalidOperand;
case MCK_SSrcB32:		case MCK_SSrcB32:
// When operands have expression values, they will return true for isToken,		// When operands have expression values, they will return true for isToken,
// because it is not possible to distinguish between a token and an		// because it is not possible to distinguish between a token and an
// expression at parse time. MatchInstructionImpl() will always try to		// expression at parse time. MatchInstructionImpl() will always try to
Show All 20 Lines

lib/Target/AMDGPU/BUFInstructions.td

Show First 20 Lines • Show All 714 Lines • ▼ Show 20 Lines
>;		>;
defm BUFFER_STORE_FORMAT_XYZ : MUBUF_Pseudo_Stores <		defm BUFFER_STORE_FORMAT_XYZ : MUBUF_Pseudo_Stores <
"buffer_store_format_xyz", VReg_96		"buffer_store_format_xyz", VReg_96
>;		>;
defm BUFFER_STORE_FORMAT_XYZW : MUBUF_Pseudo_Stores <		defm BUFFER_STORE_FORMAT_XYZW : MUBUF_Pseudo_Stores <
"buffer_store_format_xyzw", VReg_128		"buffer_store_format_xyzw", VReg_128
>;		>;

let SubtargetPredicate = HasUnpackedD16VMem, D16 = 1 in {		let SubtargetPredicate = HasUnpackedD16VMem, D16Buf = 1 in {
defm BUFFER_LOAD_FORMAT_D16_X_gfx80 : MUBUF_Pseudo_Loads <		defm BUFFER_LOAD_FORMAT_D16_X_gfx80 : MUBUF_Pseudo_Loads <
"buffer_load_format_d16_x", VGPR_32		"buffer_load_format_d16_x", VGPR_32
>;		>;
defm BUFFER_LOAD_FORMAT_D16_XY_gfx80 : MUBUF_Pseudo_Loads <		defm BUFFER_LOAD_FORMAT_D16_XY_gfx80 : MUBUF_Pseudo_Loads <
"buffer_load_format_d16_xy", VReg_64		"buffer_load_format_d16_xy", VReg_64
>;		>;
defm BUFFER_LOAD_FORMAT_D16_XYZ_gfx80 : MUBUF_Pseudo_Loads <		defm BUFFER_LOAD_FORMAT_D16_XYZ_gfx80 : MUBUF_Pseudo_Loads <
"buffer_load_format_d16_xyz", VReg_96		"buffer_load_format_d16_xyz", VReg_96
Show All 10 Lines	let SubtargetPredicate = HasUnpackedD16VMem, D16Buf = 1 in {
defm BUFFER_STORE_FORMAT_D16_XYZ_gfx80 : MUBUF_Pseudo_Stores <		defm BUFFER_STORE_FORMAT_D16_XYZ_gfx80 : MUBUF_Pseudo_Stores <
"buffer_store_format_d16_xyz", VReg_96		"buffer_store_format_d16_xyz", VReg_96
>;		>;
defm BUFFER_STORE_FORMAT_D16_XYZW_gfx80 : MUBUF_Pseudo_Stores <		defm BUFFER_STORE_FORMAT_D16_XYZW_gfx80 : MUBUF_Pseudo_Stores <
"buffer_store_format_d16_xyzw", VReg_128		"buffer_store_format_d16_xyzw", VReg_128
>;		>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem, D16 = 1 in {		let SubtargetPredicate = HasPackedD16VMem, D16Buf = 1 in {
defm BUFFER_LOAD_FORMAT_D16_X : MUBUF_Pseudo_Loads <		defm BUFFER_LOAD_FORMAT_D16_X : MUBUF_Pseudo_Loads <
"buffer_load_format_d16_x", VGPR_32		"buffer_load_format_d16_x", VGPR_32
>;		>;
defm BUFFER_LOAD_FORMAT_D16_XY : MUBUF_Pseudo_Loads <		defm BUFFER_LOAD_FORMAT_D16_XY : MUBUF_Pseudo_Loads <
"buffer_load_format_d16_xy", VGPR_32		"buffer_load_format_d16_xy", VGPR_32
>;		>;
defm BUFFER_LOAD_FORMAT_D16_XYZ : MUBUF_Pseudo_Loads <		defm BUFFER_LOAD_FORMAT_D16_XYZ : MUBUF_Pseudo_Loads <
"buffer_load_format_d16_xyz", VReg_64		"buffer_load_format_d16_xyz", VReg_64
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
defm TBUFFER_LOAD_FORMAT_XY : MTBUF_Pseudo_Loads <"tbuffer_load_format_xy", VReg_64>;		defm TBUFFER_LOAD_FORMAT_XY : MTBUF_Pseudo_Loads <"tbuffer_load_format_xy", VReg_64>;
defm TBUFFER_LOAD_FORMAT_XYZ : MTBUF_Pseudo_Loads <"tbuffer_load_format_xyz", VReg_128>;		defm TBUFFER_LOAD_FORMAT_XYZ : MTBUF_Pseudo_Loads <"tbuffer_load_format_xyz", VReg_128>;
defm TBUFFER_LOAD_FORMAT_XYZW : MTBUF_Pseudo_Loads <"tbuffer_load_format_xyzw", VReg_128>;		defm TBUFFER_LOAD_FORMAT_XYZW : MTBUF_Pseudo_Loads <"tbuffer_load_format_xyzw", VReg_128>;
defm TBUFFER_STORE_FORMAT_X : MTBUF_Pseudo_Stores <"tbuffer_store_format_x", VGPR_32>;		defm TBUFFER_STORE_FORMAT_X : MTBUF_Pseudo_Stores <"tbuffer_store_format_x", VGPR_32>;
defm TBUFFER_STORE_FORMAT_XY : MTBUF_Pseudo_Stores <"tbuffer_store_format_xy", VReg_64>;		defm TBUFFER_STORE_FORMAT_XY : MTBUF_Pseudo_Stores <"tbuffer_store_format_xy", VReg_64>;
defm TBUFFER_STORE_FORMAT_XYZ : MTBUF_Pseudo_Stores <"tbuffer_store_format_xyz", VReg_128>;		defm TBUFFER_STORE_FORMAT_XYZ : MTBUF_Pseudo_Stores <"tbuffer_store_format_xyz", VReg_128>;
defm TBUFFER_STORE_FORMAT_XYZW : MTBUF_Pseudo_Stores <"tbuffer_store_format_xyzw", VReg_128>;		defm TBUFFER_STORE_FORMAT_XYZW : MTBUF_Pseudo_Stores <"tbuffer_store_format_xyzw", VReg_128>;

let SubtargetPredicate = HasUnpackedD16VMem, D16 = 1 in {		let SubtargetPredicate = HasUnpackedD16VMem, D16Buf = 1 in {
defm TBUFFER_LOAD_FORMAT_D16_X_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_x", VGPR_32>;		defm TBUFFER_LOAD_FORMAT_D16_X_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_x", VGPR_32>;
defm TBUFFER_LOAD_FORMAT_D16_XY_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xy", VReg_64>;		defm TBUFFER_LOAD_FORMAT_D16_XY_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xy", VReg_64>;
defm TBUFFER_LOAD_FORMAT_D16_XYZ_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyz", VReg_96>;		defm TBUFFER_LOAD_FORMAT_D16_XYZ_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyz", VReg_96>;
defm TBUFFER_LOAD_FORMAT_D16_XYZW_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyzw", VReg_128>;		defm TBUFFER_LOAD_FORMAT_D16_XYZW_gfx80 : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyzw", VReg_128>;
defm TBUFFER_STORE_FORMAT_D16_X_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_x", VGPR_32>;		defm TBUFFER_STORE_FORMAT_D16_X_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_x", VGPR_32>;
defm TBUFFER_STORE_FORMAT_D16_XY_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xy", VReg_64>;		defm TBUFFER_STORE_FORMAT_D16_XY_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xy", VReg_64>;
defm TBUFFER_STORE_FORMAT_D16_XYZ_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyz", VReg_96>;		defm TBUFFER_STORE_FORMAT_D16_XYZ_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyz", VReg_96>;
defm TBUFFER_STORE_FORMAT_D16_XYZW_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyzw", VReg_128>;		defm TBUFFER_STORE_FORMAT_D16_XYZW_gfx80 : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyzw", VReg_128>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem, D16 = 1 in {		let SubtargetPredicate = HasPackedD16VMem, D16Buf = 1 in {
defm TBUFFER_LOAD_FORMAT_D16_X : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_x", VGPR_32>;		defm TBUFFER_LOAD_FORMAT_D16_X : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_x", VGPR_32>;
defm TBUFFER_LOAD_FORMAT_D16_XY : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xy", VGPR_32>;		defm TBUFFER_LOAD_FORMAT_D16_XY : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xy", VGPR_32>;
defm TBUFFER_LOAD_FORMAT_D16_XYZ : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyz", VReg_64>;		defm TBUFFER_LOAD_FORMAT_D16_XYZ : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyz", VReg_64>;
defm TBUFFER_LOAD_FORMAT_D16_XYZW : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyzw", VReg_64>;		defm TBUFFER_LOAD_FORMAT_D16_XYZW : MTBUF_Pseudo_Loads <"tbuffer_load_format_d16_xyzw", VReg_64>;
defm TBUFFER_STORE_FORMAT_D16_X : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_x", VGPR_32>;		defm TBUFFER_STORE_FORMAT_D16_X : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_x", VGPR_32>;
defm TBUFFER_STORE_FORMAT_D16_XY : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xy", VGPR_32>;		defm TBUFFER_STORE_FORMAT_D16_XY : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xy", VGPR_32>;
defm TBUFFER_STORE_FORMAT_D16_XYZ : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyz", VReg_64>;		defm TBUFFER_STORE_FORMAT_D16_XYZ : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyz", VReg_64>;
defm TBUFFER_STORE_FORMAT_D16_XYZW : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyzw", VReg_64>;		defm TBUFFER_STORE_FORMAT_D16_XYZW : MTBUF_Pseudo_Stores <"tbuffer_store_format_d16_xyzw", VReg_64>;
▲ Show 20 Lines • Show All 1,083 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	DecodeStatus AMDGPUDisassembler::convertSDWAInst(MCInst &MI) const {
return MCDisassembler::Success;		return MCDisassembler::Success;
}		}

// Note that MIMG format provides no information about VADDR size.		// Note that MIMG format provides no information about VADDR size.
// Consequently, decoded instructions always show address		// Consequently, decoded instructions always show address
// as if it has 1 dword, which could be not really so.		// as if it has 1 dword, which could be not really so.
DecodeStatus AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {		DecodeStatus AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {

if (MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::Gather4) {
return MCDisassembler::Success;
}

int VDstIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),		int VDstIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
AMDGPU::OpName::vdst);		AMDGPU::OpName::vdst);

int VDataIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),		int VDataIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);

int DMaskIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),		int DMaskIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
AMDGPU::OpName::dmask);		AMDGPU::OpName::dmask);

int TFEIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),		int TFEIdx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
AMDGPU::OpName::tfe);		AMDGPU::OpName::tfe);
		int D16Idx = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
		AMDGPU::OpName::d16);

assert(VDataIdx != -1);		assert(VDataIdx != -1);
assert(DMaskIdx != -1);		assert(DMaskIdx != -1);
assert(TFEIdx != -1);		assert(TFEIdx != -1);

bool IsAtomic = (VDstIdx != -1);		bool IsAtomic = (VDstIdx != -1);
		bool IsGather4 = MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::Gather4;

unsigned DMask = MI.getOperand(DMaskIdx).getImm() & 0xf;		unsigned DMask = MI.getOperand(DMaskIdx).getImm() & 0xf;
if (DMask == 0)		if (DMask == 0)
return MCDisassembler::Success;		return MCDisassembler::Success;

unsigned DstSize = countPopulation(DMask);		unsigned DstSize = IsGather4 ? 4 : countPopulation(DMask);
if (DstSize == 1)		if (DstSize == 1)
return MCDisassembler::Success;		return MCDisassembler::Success;

bool D16 = MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::D16;		bool D16 = D16Idx >= 0 && MI.getOperand(D16Idx).getImm();
if (D16 && AMDGPU::hasPackedD16(STI)) {		if (D16 && AMDGPU::hasPackedD16(STI)) {
DstSize = (DstSize + 1) / 2;		DstSize = (DstSize + 1) / 2;
}		}

// FIXME: Add tfe support		// FIXME: Add tfe support
if (MI.getOperand(TFEIdx).getImm())		if (MI.getOperand(TFEIdx).getImm())
return MCDisassembler::Success;		return MCDisassembler::Success;

int NewOpcode = -1;		int NewOpcode = -1;

if (IsAtomic) {		if (IsAtomic) {
if (DMask == 0x1 \|\| DMask == 0x3 \|\| DMask == 0xF) {		if (DMask == 0x1 \|\| DMask == 0x3 \|\| DMask == 0xF) {
NewOpcode = AMDGPU::getMaskedMIMGAtomicOp(*MCII, MI.getOpcode(), DstSize);		NewOpcode = AMDGPU::getMaskedMIMGAtomicOp(*MCII, MI.getOpcode(), DstSize);
}		}
if (NewOpcode == -1) return MCDisassembler::Success;		if (NewOpcode == -1) return MCDisassembler::Success;
		} else if (IsGather4) {
		if (D16 && AMDGPU::hasPackedD16(STI))
		NewOpcode = AMDGPU::getMIMGGatherOpPackedD16(MI.getOpcode());
} else {		} else {
NewOpcode = AMDGPU::getMaskedMIMGOp(*MCII, MI.getOpcode(), DstSize);		NewOpcode = AMDGPU::getMaskedMIMGOp(*MCII, MI.getOpcode(), DstSize);
assert(NewOpcode != -1 && "could not find matching mimg channel instruction");		assert(NewOpcode != -1 && "could not find matching mimg channel instruction");
}		}

auto RCID = MCII->get(NewOpcode).OpInfo[VDataIdx].RegClass;		auto RCID = MCII->get(NewOpcode).OpInfo[VDataIdx].RegClass;

// Get first subregister of VData		// Get first subregister of VData
▲ Show 20 Lines • Show All 599 Lines • Show Last 20 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	private:
void printUNorm(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printUNorm(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printDA(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printDA(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printR128(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printR128(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printLWE(const MCInst *MI, unsigned OpNo,		void printLWE(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
		void printD16(const MCInst *MI, unsigned OpNo,
		const MCSubtargetInfo &STI, raw_ostream &O);
void printExpCompr(const MCInst *MI, unsigned OpNo,		void printExpCompr(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printExpVM(const MCInst *MI, unsigned OpNo,		void printExpVM(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printDFMT(const MCInst *MI, unsigned OpNo,		void printDFMT(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printNFMT(const MCInst *MI, unsigned OpNo,		void printNFMT(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	void AMDGPUInstPrinter::printR128(const MCInst *MI, unsigned OpNo,
printNamedBit(MI, OpNo, O, "r128");		printNamedBit(MI, OpNo, O, "r128");
}		}

void AMDGPUInstPrinter::printLWE(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printLWE(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {		const MCSubtargetInfo &STI, raw_ostream &O) {
printNamedBit(MI, OpNo, O, "lwe");		printNamedBit(MI, OpNo, O, "lwe");
}		}

		void AMDGPUInstPrinter::printD16(const MCInst *MI, unsigned OpNo,
		const MCSubtargetInfo &STI, raw_ostream &O) {
		printNamedBit(MI, OpNo, O, "d16");
		}

void AMDGPUInstPrinter::printExpCompr(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printExpCompr(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (MI->getOperand(OpNo).getImm())		if (MI->getOperand(OpNo).getImm())
O << " compr";		O << " compr";
}		}

void AMDGPUInstPrinter::printExpVM(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printExpVM(const MCInst *MI, unsigned OpNo,
▲ Show 20 Lines • Show All 1,263 Lines • Show Last 20 Lines

lib/Target/AMDGPU/MIMGInstructions.td

Show All 11 Lines	class MIMG_Mask <string op, int channels> {
int Channels = channels;		int Channels = channels;
}		}

class MIMG_Atomic_Size <string op, bit is32Bit> {		class MIMG_Atomic_Size <string op, bit is32Bit> {
string Op = op;		string Op = op;
int AtomicSize = !if(is32Bit, 1, 2);		int AtomicSize = !if(is32Bit, 1, 2);
}		}

		class MIMG_Gather_Size <string op, int channels> {
		string Op = op;
		int Channels = channels;
		}

class mimg <bits<7> si, bits<7> vi = si> {		class mimg <bits<7> si, bits<7> vi = si> {
field bits<7> SI = si;		field bits<7> SI = si;
field bits<7> VI = vi;		field bits<7> VI = vi;
}		}

class MIMG_Helper <dag outs, dag ins, string asm,		class MIMG_Helper <dag outs, dag ins, string asm,
string dns=""> : MIMG<outs, ins, asm,[]> {		string dns=""> : MIMG<outs, ins, asm,[]> {
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 0;		let mayStore = 0;
let hasPostISelHook = 1;		let hasPostISelHook = 1;
let DecoderNamespace = dns;		let DecoderNamespace = dns;
let isAsmParserOnly = !if(!eq(dns,""), 1, 0);		let isAsmParserOnly = !if(!eq(dns,""), 1, 0);
let AsmMatchConverter = "cvtMIMG";		let AsmMatchConverter = "cvtMIMG";
let usesCustomInserter = 1;		let usesCustomInserter = 1;
let SchedRW = [WriteVMEM];		let SchedRW = [WriteVMEM];
}		}

class MIMG_NoSampler_Helper <bits<7> op, string asm,		class MIMG_NoSampler_Helper <bits<7> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
RegisterClass addr_rc,		RegisterClass addr_rc,
bit d16_bit=0,		bit has_d16,
string dns=""> : MIMG_Helper <		string dns="">
(outs dst_rc:$vdata),		: MIMG_Helper <(outs dst_rc:$vdata),
(ins addr_rc:$vaddr, SReg_256:$srsrc,		!con((ins addr_rc:$vaddr, SReg_256:$srsrc,
DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,		DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,
R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),		R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),
asm#" $vdata, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da"#!if(d16_bit, " d16", ""),		!if(has_d16, (ins D16:$d16), (ins))),
dns>, MIMGe<op> {		asm#" $vdata, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da"
		#!if(has_d16, "$d16", ""),
		dns>,
		MIMGe<op> {
let ssamp = 0;		let ssamp = 0;
let D16 = d16;
}

multiclass MIMG_NoSampler_Src_Helper_Helper <bits<7> op, string asm,		let HasD16 = has_d16;
RegisterClass dst_rc,		let d16 = !if(HasD16, ?, 0);
int channels, bit d16_bit,
string suffix> {
def NAME # _V1 # suffix : MIMG_NoSampler_Helper <op, asm, dst_rc, VGPR_32, d16_bit,
!if(!eq(channels, 1), "AMDGPU", "")>,
MIMG_Mask<asm#"_V1"#suffix, channels>;
def NAME # _V2 # suffix : MIMG_NoSampler_Helper <op, asm, dst_rc, VReg_64, d16_bit>,
MIMG_Mask<asm#"_V2"#suffix, channels>;
def NAME # _V3 # suffix : MIMG_NoSampler_Helper <op, asm, dst_rc, VReg_96, d16_bit>,
MIMG_Mask<asm#"_V3"#suffix, channels>;
def NAME # _V4 # suffix : MIMG_NoSampler_Helper <op, asm, dst_rc, VReg_128, d16_bit>,
MIMG_Mask<asm#"_V4"#suffix, channels>;
}		}

multiclass MIMG_NoSampler_Src_Helper <bits<7> op, string asm,		multiclass MIMG_NoSampler_Src_Helper <bits<7> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
int channels> {		int channels, bit has_d16> {
defm NAME : MIMG_NoSampler_Src_Helper_Helper <op, asm, dst_rc, channels, 0, "">;		def NAME # _V1 : MIMG_NoSampler_Helper <op, asm, dst_rc, VGPR_32, has_d16,
		!if(!eq(channels, 1), "AMDGPU", "")>,
let d16 = 1 in {		MIMG_Mask<asm#"_V1", channels>;
let SubtargetPredicate = HasPackedD16VMem in {		def NAME # _V2 : MIMG_NoSampler_Helper <op, asm, dst_rc, VReg_64, has_d16>,
defm NAME : MIMG_NoSampler_Src_Helper_Helper <op, asm, dst_rc, channels, 1, "_D16">;		MIMG_Mask<asm#"_V2", channels>;
} // End HasPackedD16VMem.		def NAME # _V3 : MIMG_NoSampler_Helper <op, asm, dst_rc, VReg_96, has_d16>,
		MIMG_Mask<asm#"_V3", channels>;
let SubtargetPredicate = HasUnpackedD16VMem, DecoderNamespace = "GFX80_UNPACKED" in {		def NAME # _V4 : MIMG_NoSampler_Helper <op, asm, dst_rc, VReg_128, has_d16>,
defm NAME : MIMG_NoSampler_Src_Helper_Helper <op, asm, dst_rc, channels, 1, "_D16_gfx80">;		MIMG_Mask<asm#"_V4", channels>;
} // End HasUnpackedD16VMem.
} // End d16 = 1.
}

multiclass MIMG_NoSampler <bits<7> op, string asm> {
defm _V1 : MIMG_NoSampler_Src_Helper <op, asm, VGPR_32, 1>;
defm _V2 : MIMG_NoSampler_Src_Helper <op, asm, VReg_64, 2>;
defm _V3 : MIMG_NoSampler_Src_Helper <op, asm, VReg_96, 3>;
defm _V4 : MIMG_NoSampler_Src_Helper <op, asm, VReg_128, 4>;
}		}

multiclass MIMG_PckNoSampler <bits<7> op, string asm> {		multiclass MIMG_NoSampler <bits<7> op, string asm, bit has_d16> {
defm NAME # _V1 : MIMG_NoSampler_Src_Helper_Helper <op, asm, VGPR_32, 1, 0, "">;		defm _V1 : MIMG_NoSampler_Src_Helper <op, asm, VGPR_32, 1, has_d16>;
defm NAME # _V2 : MIMG_NoSampler_Src_Helper_Helper <op, asm, VReg_64, 2, 0, "">;		defm _V2 : MIMG_NoSampler_Src_Helper <op, asm, VReg_64, 2, has_d16>;
defm NAME # _V3 : MIMG_NoSampler_Src_Helper_Helper <op, asm, VReg_96, 3, 0, "">;		defm _V3 : MIMG_NoSampler_Src_Helper <op, asm, VReg_96, 3, has_d16>;
defm NAME # _V4 : MIMG_NoSampler_Src_Helper_Helper <op, asm, VReg_128, 4, 0, "">;		defm _V4 : MIMG_NoSampler_Src_Helper <op, asm, VReg_128, 4, has_d16>;
}		}

class MIMG_Store_Helper <bits<7> op, string asm,		class MIMG_Store_Helper <bits<7> op, string asm,
RegisterClass data_rc,		RegisterClass data_rc,
RegisterClass addr_rc,		RegisterClass addr_rc,
bit d16_bit=0,		bit has_d16,
string dns = ""> : MIMG_Helper <		string dns = "">
(outs),		: MIMG_Helper <(outs),
(ins data_rc:$vdata, addr_rc:$vaddr, SReg_256:$srsrc,		!con((ins data_rc:$vdata, addr_rc:$vaddr, SReg_256:$srsrc,
DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,		DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,
R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),		R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),
asm#" $vdata, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da"#!if(d16_bit, " d16", ""), dns>, MIMGe<op> {		!if(has_d16, (ins D16:$d16), (ins))),
		asm#" $vdata, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da"
		#!if(has_d16, "$d16", ""),
		dns>,
		MIMGe<op> {
let ssamp = 0;		let ssamp = 0;
let mayLoad = 0;		let mayLoad = 0;
let mayStore = 1;		let mayStore = 1;
let hasSideEffects = 0;		let hasSideEffects = 0;
let hasPostISelHook = 0;		let hasPostISelHook = 0;
let DisableWQM = 1;		let DisableWQM = 1;
let D16 = d16;
}

multiclass MIMG_Store_Addr_Helper_Helper <bits<7> op, string asm,		let HasD16 = has_d16;
RegisterClass data_rc,		let d16 = !if(HasD16, ?, 0);
int channels, bit d16_bit,
string suffix> {
def NAME # _V1 # suffix : MIMG_Store_Helper <op, asm, data_rc, VGPR_32, d16_bit,
!if(!eq(channels, 1), "AMDGPU", "")>,
MIMG_Mask<asm#"_V1"#suffix, channels>;
def NAME # _V2 # suffix : MIMG_Store_Helper <op, asm, data_rc, VReg_64, d16_bit>,
MIMG_Mask<asm#"_V2"#suffix, channels>;
def NAME # _V3 # suffix : MIMG_Store_Helper <op, asm, data_rc, VReg_96, d16_bit>,
MIMG_Mask<asm#"_V3"#suffix, channels>;
def NAME # _V4 # suffix : MIMG_Store_Helper <op, asm, data_rc, VReg_128, d16_bit>,
MIMG_Mask<asm#"_V4"#suffix, channels>;
}		}

multiclass MIMG_Store_Addr_Helper <bits<7> op, string asm,		multiclass MIMG_Store_Addr_Helper <bits<7> op, string asm,
RegisterClass data_rc,		RegisterClass data_rc,
int channels> {		int channels, bit has_d16> {
defm NAME : MIMG_Store_Addr_Helper_Helper <op, asm, data_rc, channels, 0, "">;		def NAME # _V1 : MIMG_Store_Helper <op, asm, data_rc, VGPR_32, has_d16,
		!if(!eq(channels, 1), "AMDGPU", "")>,
let d16 = 1 in {		MIMG_Mask<asm#"_V1", channels>;
let SubtargetPredicate = HasPackedD16VMem in {		def NAME # _V2 : MIMG_Store_Helper <op, asm, data_rc, VReg_64, has_d16>,
defm NAME : MIMG_Store_Addr_Helper_Helper <op, asm, data_rc, channels, 1, "_D16">;		MIMG_Mask<asm#"_V2", channels>;
} // End HasPackedD16VMem.		def NAME # _V3 : MIMG_Store_Helper <op, asm, data_rc, VReg_96, has_d16>,
		MIMG_Mask<asm#"_V3", channels>;
let SubtargetPredicate = HasUnpackedD16VMem, DecoderNamespace = "GFX80_UNPACKED" in {		def NAME # _V4 : MIMG_Store_Helper <op, asm, data_rc, VReg_128, has_d16>,
defm NAME : MIMG_Store_Addr_Helper_Helper <op, asm, data_rc, channels, 1, "_D16_gfx80">;		MIMG_Mask<asm#"_V4", channels>;
} // End HasUnpackedD16VMem.
} // End d16 = 1.
}

multiclass MIMG_Store <bits<7> op, string asm> {
defm _V1 : MIMG_Store_Addr_Helper <op, asm, VGPR_32, 1>;
defm _V2 : MIMG_Store_Addr_Helper <op, asm, VReg_64, 2>;
defm _V3 : MIMG_Store_Addr_Helper <op, asm, VReg_96, 3>;
defm _V4 : MIMG_Store_Addr_Helper <op, asm, VReg_128, 4>;
}		}

multiclass MIMG_PckStore <bits<7> op, string asm> {		multiclass MIMG_Store <bits<7> op, string asm, bit has_d16> {
defm NAME # _V1 : MIMG_Store_Addr_Helper_Helper <op, asm, VGPR_32, 1, 0, "">;		defm _V1 : MIMG_Store_Addr_Helper <op, asm, VGPR_32, 1, has_d16>;
defm NAME # _V2 : MIMG_Store_Addr_Helper_Helper <op, asm, VReg_64, 2, 0, "">;		defm _V2 : MIMG_Store_Addr_Helper <op, asm, VReg_64, 2, has_d16>;
defm NAME # _V3 : MIMG_Store_Addr_Helper_Helper <op, asm, VReg_96, 3, 0, "">;		defm _V3 : MIMG_Store_Addr_Helper <op, asm, VReg_96, 3, has_d16>;
defm NAME # _V4 : MIMG_Store_Addr_Helper_Helper <op, asm, VReg_128, 4, 0, "">;		defm _V4 : MIMG_Store_Addr_Helper <op, asm, VReg_128, 4, has_d16>;
}		}

class MIMG_Atomic_Helper <string asm, RegisterClass data_rc,		class MIMG_Atomic_Helper <string asm, RegisterClass data_rc,
RegisterClass addr_rc, string dns="",		RegisterClass addr_rc, string dns="",
bit enableDasm = 0> : MIMG_Helper <		bit enableDasm = 0> : MIMG_Helper <
(outs data_rc:$vdst),		(outs data_rc:$vdst),
(ins data_rc:$vdata, addr_rc:$vaddr, SReg_256:$srsrc,		(ins data_rc:$vdata, addr_rc:$vaddr, SReg_256:$srsrc,
DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,		DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,
R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),		R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),
asm#" $vdst, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da",		asm#" $vdst, $vaddr, $srsrc$dmask$unorm$glc$slc$r128$tfe$lwe$da",
!if(enableDasm, dns, "")> {		!if(enableDasm, dns, "")> {
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 1;		let mayStore = 1;
let hasSideEffects = 1; // FIXME: Remove this		let hasSideEffects = 1; // FIXME: Remove this
let hasPostISelHook = 0;		let hasPostISelHook = 0;
let DisableWQM = 1;		let DisableWQM = 1;
let Constraints = "$vdst = $vdata";		let Constraints = "$vdst = $vdata";
let AsmMatchConverter = "cvtMIMGAtomic";		let AsmMatchConverter = "cvtMIMGAtomic";
}		}

class MIMG_Atomic_Real_si<mimg op, string name, string asm,		class MIMG_Atomic_Real_si<mimg op, string name, string asm,
RegisterClass data_rc, RegisterClass addr_rc, bit enableDasm> :		RegisterClass data_rc, RegisterClass addr_rc,
MIMG_Atomic_Helper<asm, data_rc, addr_rc, "SICI", enableDasm>,		bit enableDasm>
		: MIMG_Atomic_Helper<asm, data_rc, addr_rc, "SICI", enableDasm>,
SIMCInstr<name, SIEncodingFamily.SI>,		SIMCInstr<name, SIEncodingFamily.SI>,
MIMGe<op.SI> {		MIMGe<op.SI> {
let isCodeGenOnly = 0;		let isCodeGenOnly = 0;
let AssemblerPredicates = [isSICI];		let AssemblerPredicates = [isSICI];
let DisableDecoder = DisableSIDecoder;		let DisableDecoder = DisableSIDecoder;
		let d16 = 0;
}		}

class MIMG_Atomic_Real_vi<mimg op, string name, string asm,		class MIMG_Atomic_Real_vi<mimg op, string name, string asm,
RegisterClass data_rc, RegisterClass addr_rc, bit enableDasm> :		RegisterClass data_rc, RegisterClass addr_rc,
MIMG_Atomic_Helper<asm, data_rc, addr_rc, "VI", enableDasm>,		bit enableDasm>
		: MIMG_Atomic_Helper<asm, data_rc, addr_rc, "VI", enableDasm>,
SIMCInstr<name, SIEncodingFamily.VI>,		SIMCInstr<name, SIEncodingFamily.VI>,
MIMGe<op.VI> {		MIMGe<op.VI> {
let isCodeGenOnly = 0;		let isCodeGenOnly = 0;
let AssemblerPredicates = [isVI];		let AssemblerPredicates = [isVI];
let DisableDecoder = DisableVIDecoder;		let DisableDecoder = DisableVIDecoder;
		let d16 = 0;
}		}

multiclass MIMG_Atomic_Helper_m <mimg op,		multiclass MIMG_Atomic_Helper_m <mimg op,
string name,		string name,
string asm,		string asm,
string key,		string key,
RegisterClass data_rc,		RegisterClass data_rc,
RegisterClass addr_rc,		RegisterClass addr_rc,
Show All 35 Lines	multiclass MIMG_Atomic <mimg op, string asm,
// Other variants are reconstructed by disassembler using dmask and tfe.		// Other variants are reconstructed by disassembler using dmask and tfe.
defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm # "_V1", asm, data_rc_32, 1, 1>;		defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm # "_V1", asm, data_rc_32, 1, 1>;
defm _V2 : MIMG_Atomic_Addr_Helper_m <op, asm # "_V2", asm, data_rc_64, 0>;		defm _V2 : MIMG_Atomic_Addr_Helper_m <op, asm # "_V2", asm, data_rc_64, 0>;
}		}

class MIMG_Sampler_Helper <bits<7> op, string asm,		class MIMG_Sampler_Helper <bits<7> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
RegisterClass src_rc,		RegisterClass src_rc,
bit wqm,		bit wqm, bit has_d16,
bit d16_bit=0,		string dns="">
string dns=""> : MIMG_Helper <		: MIMG_Helper <(outs dst_rc:$vdata),
(outs dst_rc:$vdata),		!con((ins src_rc:$vaddr, SReg_256:$srsrc, SReg_128:$ssamp,
(ins src_rc:$vaddr, SReg_256:$srsrc, SReg_128:$ssamp,
DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,		DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,
R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),		R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),
asm#" $vdata, $vaddr, $srsrc, $ssamp$dmask$unorm$glc$slc$r128$tfe$lwe$da"#!if(d16_bit, " d16", ""),		!if(has_d16, (ins D16:$d16), (ins))),
dns>, MIMGe<op> {		asm#" $vdata, $vaddr, $srsrc, $ssamp$dmask$unorm$glc$slc$r128$tfe$lwe$da"
		#!if(has_d16, "$d16", ""),
		dns>,
		MIMGe<op> {
let WQM = wqm;		let WQM = wqm;
let D16 = d16;
}

multiclass MIMG_Sampler_Src_Helper_Helper <bits<7> op, string asm,		let HasD16 = has_d16;
RegisterClass dst_rc,		let d16 = !if(HasD16, ?, 0);
int channels, bit wqm,
bit d16_bit, string suffix> {
def _V1 # suffix : MIMG_Sampler_Helper <op, asm, dst_rc, VGPR_32, wqm, d16_bit,
!if(!eq(channels, 1), "AMDGPU", "")>,
MIMG_Mask<asm#"_V1"#suffix, channels>;
def _V2 # suffix : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_64, wqm, d16_bit>,
MIMG_Mask<asm#"_V2"#suffix, channels>;
def _V3 # suffix : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_96, wqm, d16_bit>,
MIMG_Mask<asm#"_V3"#suffix, channels>;
def _V4 # suffix : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_128, wqm, d16_bit>,
MIMG_Mask<asm#"_V4"#suffix, channels>;
def _V8 # suffix : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_256, wqm, d16_bit>,
MIMG_Mask<asm#"_V8"#suffix, channels>;
def _V16 # suffix : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_512, wqm, d16_bit>,
MIMG_Mask<asm#"_V16"#suffix, channels>;
}		}

multiclass MIMG_Sampler_Src_Helper <bits<7> op, string asm,		multiclass MIMG_Sampler_Src_Helper <bits<7> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
int channels, bit wqm> {		int channels, bit wqm, bit has_d16> {
defm "" : MIMG_Sampler_Src_Helper_Helper <op, asm, dst_rc, channels, wqm, 0, "">;		def _V1 : MIMG_Sampler_Helper <op, asm, dst_rc, VGPR_32, wqm, has_d16,
		!if(!eq(channels, 1), "AMDGPU", "")>,
let d16 = 1 in {		MIMG_Mask<asm#"_V1", channels>;
let SubtargetPredicate = HasPackedD16VMem in {		def _V2 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_64, wqm, has_d16>,
defm "" : MIMG_Sampler_Src_Helper_Helper <op, asm, dst_rc, channels, wqm, 1, "_D16">;		MIMG_Mask<asm#"_V2", channels>;
} // End HasPackedD16VMem.		def _V3 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_96, wqm, has_d16>,
		MIMG_Mask<asm#"_V3", channels>;
let SubtargetPredicate = HasUnpackedD16VMem, DecoderNamespace = "GFX80_UNPACKED" in {		def _V4 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_128, wqm, has_d16>,
defm "" : MIMG_Sampler_Src_Helper_Helper <op, asm, dst_rc, channels, wqm, 1, "_D16_gfx80">;		MIMG_Mask<asm#"_V4", channels>;
} // End HasUnpackedD16VMem.		def _V8 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_256, wqm, has_d16>,
} // End d16 = 1.		MIMG_Mask<asm#"_V8", channels>;
}		def _V16 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_512, wqm, has_d16>,
		MIMG_Mask<asm#"_V16", channels>;
multiclass MIMG_Sampler <bits<7> op, string asm, bit wqm=0> {		}
defm _V1 : MIMG_Sampler_Src_Helper<op, asm, VGPR_32, 1, wqm>;
defm _V2 : MIMG_Sampler_Src_Helper<op, asm, VReg_64, 2, wqm>;		multiclass MIMG_Sampler <bits<7> op, string asm, bit wqm = 0, bit has_d16 = 1> {
defm _V3 : MIMG_Sampler_Src_Helper<op, asm, VReg_96, 3, wqm>;		defm _V1 : MIMG_Sampler_Src_Helper<op, asm, VGPR_32, 1, wqm, has_d16>;
defm _V4 : MIMG_Sampler_Src_Helper<op, asm, VReg_128, 4, wqm>;		defm _V2 : MIMG_Sampler_Src_Helper<op, asm, VReg_64, 2, wqm, has_d16>;
		defm _V3 : MIMG_Sampler_Src_Helper<op, asm, VReg_96, 3, wqm, has_d16>;
		defm _V4 : MIMG_Sampler_Src_Helper<op, asm, VReg_128, 4, wqm, has_d16>;
}		}

multiclass MIMG_Sampler_WQM <bits<7> op, string asm> : MIMG_Sampler<op, asm, 1>;		multiclass MIMG_Sampler_WQM <bits<7> op, string asm> : MIMG_Sampler<op, asm, 1>;

class MIMG_Gather_Helper <bits<7> op, string asm,		class MIMG_Gather_Helper <bits<7> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
RegisterClass src_rc,		RegisterClass src_rc,
bit wqm,		bit wqm,
bit d16_bit=0,		string dns="">
string dns=""> : MIMG <		: MIMG <(outs dst_rc:$vdata),
(outs dst_rc:$vdata),
(ins src_rc:$vaddr, SReg_256:$srsrc, SReg_128:$ssamp,		(ins src_rc:$vaddr, SReg_256:$srsrc, SReg_128:$ssamp,
DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,		DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,
R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da),		R128:$r128, TFE:$tfe, LWE:$lwe, DA:$da, D16:$d16),
asm#" $vdata, $vaddr, $srsrc, $ssamp$dmask$unorm$glc$slc$r128$tfe$lwe$da"#!if(d16_bit, " d16", ""),		asm#" $vdata, $vaddr, $srsrc, $ssamp$dmask$unorm$glc$slc$r128$tfe$lwe$da$d16",
[]>, MIMGe<op> {		[]>,
		MIMGe<op> {
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 0;		let mayStore = 0;

// DMASK was repurposed for GATHER4. 4 components are always		// DMASK was repurposed for GATHER4. 4 components are always
// returned and DMASK works like a swizzle - it selects		// returned and DMASK works like a swizzle - it selects
// the component to fetch. The only useful DMASK values are		// the component to fetch. The only useful DMASK values are
// 1=red, 2=green, 4=blue, 8=alpha. (e.g. 1 returns		// 1=red, 2=green, 4=blue, 8=alpha. (e.g. 1 returns
// (red,red,red,red) etc.) The ISA document doesn't mention		// (red,red,red,red) etc.) The ISA document doesn't mention
// this.		// this.
// Therefore, disable all code which updates DMASK by setting this:		// Therefore, disable all code which updates DMASK by setting this:
let Gather4 = 1;		let Gather4 = 1;
let hasPostISelHook = 0;		let hasPostISelHook = 0;
let WQM = wqm;		let WQM = wqm;
let D16 = d16;		let HasD16 = 1;

let DecoderNamespace = dns;		let DecoderNamespace = dns;
let isAsmParserOnly = !if(!eq(dns,""), 1, 0);		let isAsmParserOnly = !if(!eq(dns,""), 1, 0);
}		}


multiclass MIMG_Gather_Src_Helper <bits<7> op, string asm,		multiclass MIMG_Gather_Src_Helper <bits<7> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
bit wqm, bit d16_bit,		int channels, bit wqm> {
string prefix,		def _V1 : MIMG_Gather_Helper <op, asm, dst_rc, VGPR_32, wqm,
string suffix> {		!if(!eq(channels, 4), "AMDGPU", "")>,
def prefix # _V1 # suffix : MIMG_Gather_Helper <op, asm, dst_rc, VGPR_32, wqm, d16_bit, "AMDGPU">;		MIMG_Gather_Size<asm#"_V1", channels>;
def prefix # _V2 # suffix : MIMG_Gather_Helper <op, asm, dst_rc, VReg_64, wqm, d16_bit>;		def _V2 : MIMG_Gather_Helper <op, asm, dst_rc, VReg_64, wqm>,
def prefix # _V3 # suffix : MIMG_Gather_Helper <op, asm, dst_rc, VReg_96, wqm, d16_bit>;		MIMG_Gather_Size<asm#"_V2", channels>;
def prefix # _V4 # suffix : MIMG_Gather_Helper <op, asm, dst_rc, VReg_128, wqm, d16_bit>;		def _V3 : MIMG_Gather_Helper <op, asm, dst_rc, VReg_96, wqm>,
def prefix # _V8 # suffix : MIMG_Gather_Helper <op, asm, dst_rc, VReg_256, wqm, d16_bit>;		MIMG_Gather_Size<asm#"_V3", channels>;
def prefix # _V16 # suffix : MIMG_Gather_Helper <op, asm, dst_rc, VReg_512, wqm, d16_bit>;		def _V4 : MIMG_Gather_Helper <op, asm, dst_rc, VReg_128, wqm>,
		MIMG_Gather_Size<asm#"_V4", channels>;
		def _V8 : MIMG_Gather_Helper <op, asm, dst_rc, VReg_256, wqm>,
		MIMG_Gather_Size<asm#"_V8", channels>;
		def _V16 : MIMG_Gather_Helper <op, asm, dst_rc, VReg_512, wqm>,
		MIMG_Gather_Size<asm#"_V16", channels>;
}		}

multiclass MIMG_Gather <bits<7> op, string asm, bit wqm=0> {		multiclass MIMG_Gather <bits<7> op, string asm, bit wqm=0> {
defm "" : MIMG_Gather_Src_Helper<op, asm, VReg_128, wqm, 0, "_V4", "">;		defm _V2 : MIMG_Gather_Src_Helper<op, asm, VReg_64, 2, wqm>; /* for packed D16 only */
		defm _V4 : MIMG_Gather_Src_Helper<op, asm, VReg_128, 4, wqm>;
let d16 = 1 in {
let AssemblerPredicate = HasPackedD16VMem in {
defm "" : MIMG_Gather_Src_Helper<op, asm, VReg_64, wqm, 1, "_V2", "_D16">;
} // End HasPackedD16VMem.

let AssemblerPredicate = HasUnpackedD16VMem, DecoderNamespace = "GFX80_UNPACKED" in {
defm "" : MIMG_Gather_Src_Helper<op, asm, VReg_128, wqm, 1, "_V4", "_D16_gfx80">;
} // End HasUnpackedD16VMem.
} // End d16 = 1.
}		}

multiclass MIMG_Gather_WQM <bits<7> op, string asm> : MIMG_Gather<op, asm, 1>;		multiclass MIMG_Gather_WQM <bits<7> op, string asm> : MIMG_Gather<op, asm, 1>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MIMG Instructions		// MIMG Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
let SubtargetPredicate = isGCN in {		let SubtargetPredicate = isGCN in {
defm IMAGE_LOAD : MIMG_NoSampler <0x00000000, "image_load">;		defm IMAGE_LOAD : MIMG_NoSampler <0x00000000, "image_load", 1>;
defm IMAGE_LOAD_MIP : MIMG_NoSampler <0x00000001, "image_load_mip">;		defm IMAGE_LOAD_MIP : MIMG_NoSampler <0x00000001, "image_load_mip", 1>;
defm IMAGE_LOAD_PCK : MIMG_PckNoSampler <0x00000002, "image_load_pck">;		defm IMAGE_LOAD_PCK : MIMG_NoSampler <0x00000002, "image_load_pck", 0>;
defm IMAGE_LOAD_PCK_SGN : MIMG_PckNoSampler <0x00000003, "image_load_pck_sgn">;		defm IMAGE_LOAD_PCK_SGN : MIMG_NoSampler <0x00000003, "image_load_pck_sgn", 0>;
defm IMAGE_LOAD_MIP_PCK : MIMG_PckNoSampler <0x00000004, "image_load_mip_pck">;		defm IMAGE_LOAD_MIP_PCK : MIMG_NoSampler <0x00000004, "image_load_mip_pck", 0>;
defm IMAGE_LOAD_MIP_PCK_SGN : MIMG_PckNoSampler <0x00000005, "image_load_mip_pck_sgn">;		defm IMAGE_LOAD_MIP_PCK_SGN : MIMG_NoSampler <0x00000005, "image_load_mip_pck_sgn", 0>;
defm IMAGE_STORE : MIMG_Store <0x00000008, "image_store">;		defm IMAGE_STORE : MIMG_Store <0x00000008, "image_store", 1>;
defm IMAGE_STORE_MIP : MIMG_Store <0x00000009, "image_store_mip">;		defm IMAGE_STORE_MIP : MIMG_Store <0x00000009, "image_store_mip", 1>;
defm IMAGE_STORE_PCK : MIMG_PckStore <0x0000000a, "image_store_pck">;		defm IMAGE_STORE_PCK : MIMG_Store <0x0000000a, "image_store_pck", 0>;
defm IMAGE_STORE_MIP_PCK : MIMG_PckStore <0x0000000b, "image_store_mip_pck">;		defm IMAGE_STORE_MIP_PCK : MIMG_Store <0x0000000b, "image_store_mip_pck", 0>;

let mayLoad = 0, mayStore = 0 in {		let mayLoad = 0, mayStore = 0 in {
defm IMAGE_GET_RESINFO : MIMG_NoSampler <0x0000000e, "image_get_resinfo">;		defm IMAGE_GET_RESINFO : MIMG_NoSampler <0x0000000e, "image_get_resinfo", 0>;
}		}

defm IMAGE_ATOMIC_SWAP : MIMG_Atomic <mimg<0x0f, 0x10>, "image_atomic_swap">;		defm IMAGE_ATOMIC_SWAP : MIMG_Atomic <mimg<0x0f, 0x10>, "image_atomic_swap">;
defm IMAGE_ATOMIC_CMPSWAP : MIMG_Atomic <mimg<0x10, 0x11>, "image_atomic_cmpswap", VReg_64, VReg_128>;		defm IMAGE_ATOMIC_CMPSWAP : MIMG_Atomic <mimg<0x10, 0x11>, "image_atomic_cmpswap", VReg_64, VReg_128>;
defm IMAGE_ATOMIC_ADD : MIMG_Atomic <mimg<0x11, 0x12>, "image_atomic_add">;		defm IMAGE_ATOMIC_ADD : MIMG_Atomic <mimg<0x11, 0x12>, "image_atomic_add">;
defm IMAGE_ATOMIC_SUB : MIMG_Atomic <mimg<0x12, 0x13>, "image_atomic_sub">;		defm IMAGE_ATOMIC_SUB : MIMG_Atomic <mimg<0x12, 0x13>, "image_atomic_sub">;
//def IMAGE_ATOMIC_RSUB : MIMG_NoPattern_ <"image_atomic_rsub", 0x00000013>; -- not on VI		//def IMAGE_ATOMIC_RSUB : MIMG_NoPattern_ <"image_atomic_rsub", 0x00000013>; -- not on VI
defm IMAGE_ATOMIC_SMIN : MIMG_Atomic <mimg<0x14>, "image_atomic_smin">;		defm IMAGE_ATOMIC_SMIN : MIMG_Atomic <mimg<0x14>, "image_atomic_smin">;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
defm IMAGE_GATHER4_C_O : MIMG_Gather_WQM <0x00000058, "image_gather4_c_o">;		defm IMAGE_GATHER4_C_O : MIMG_Gather_WQM <0x00000058, "image_gather4_c_o">;
defm IMAGE_GATHER4_C_CL_O : MIMG_Gather_WQM <0x00000059, "image_gather4_c_cl_o">;		defm IMAGE_GATHER4_C_CL_O : MIMG_Gather_WQM <0x00000059, "image_gather4_c_cl_o">;
defm IMAGE_GATHER4_C_L_O : MIMG_Gather <0x0000005c, "image_gather4_c_l_o">;		defm IMAGE_GATHER4_C_L_O : MIMG_Gather <0x0000005c, "image_gather4_c_l_o">;
defm IMAGE_GATHER4_C_B_O : MIMG_Gather_WQM <0x0000005d, "image_gather4_c_b_o">;		defm IMAGE_GATHER4_C_B_O : MIMG_Gather_WQM <0x0000005d, "image_gather4_c_b_o">;
defm IMAGE_GATHER4_C_B_CL_O : MIMG_Gather_WQM <0x0000005e, "image_gather4_c_b_cl_o">;		defm IMAGE_GATHER4_C_B_CL_O : MIMG_Gather_WQM <0x0000005e, "image_gather4_c_b_cl_o">;
defm IMAGE_GATHER4_C_LZ_O : MIMG_Gather <0x0000005f, "image_gather4_c_lz_o">;		defm IMAGE_GATHER4_C_LZ_O : MIMG_Gather <0x0000005f, "image_gather4_c_lz_o">;

let mayLoad = 0, mayStore = 0 in {		let mayLoad = 0, mayStore = 0 in {
defm IMAGE_GET_LOD : MIMG_Sampler_WQM <0x00000060, "image_get_lod">;		defm IMAGE_GET_LOD : MIMG_Sampler <0x00000060, "image_get_lod", 1, 0>;
}		}

defm IMAGE_SAMPLE_CD : MIMG_Sampler <0x00000068, "image_sample_cd">;		defm IMAGE_SAMPLE_CD : MIMG_Sampler <0x00000068, "image_sample_cd">;
defm IMAGE_SAMPLE_CD_CL : MIMG_Sampler <0x00000069, "image_sample_cd_cl">;		defm IMAGE_SAMPLE_CD_CL : MIMG_Sampler <0x00000069, "image_sample_cd_cl">;
defm IMAGE_SAMPLE_C_CD : MIMG_Sampler <0x0000006a, "image_sample_c_cd">;		defm IMAGE_SAMPLE_C_CD : MIMG_Sampler <0x0000006a, "image_sample_c_cd">;
defm IMAGE_SAMPLE_C_CD_CL : MIMG_Sampler <0x0000006b, "image_sample_c_cd_cl">;		defm IMAGE_SAMPLE_C_CD_CL : MIMG_Sampler <0x0000006b, "image_sample_c_cd_cl">;
defm IMAGE_SAMPLE_CD_O : MIMG_Sampler <0x0000006c, "image_sample_cd_o">;		defm IMAGE_SAMPLE_CD_O : MIMG_Sampler <0x0000006c, "image_sample_cd_o">;
defm IMAGE_SAMPLE_CD_CL_O : MIMG_Sampler <0x0000006d, "image_sample_cd_cl_o">;		defm IMAGE_SAMPLE_CD_CL_O : MIMG_Sampler <0x0000006d, "image_sample_cd_cl_o">;
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	!if(!eq(!size(names), 1),
makeRegSequence_Fold<		makeRegSequence_Fold<
!add(f.idx, 1),		!add(f.idx, 1),
!con((INSERT_SUBREG f.lhs),		!con((INSERT_SUBREG f.lhs),
!dag(INSERT_SUBREG, [?, !cast<SubRegIndex>("sub"#f.idx)],		!dag(INSERT_SUBREG, [?, !cast<SubRegIndex>("sub"#f.idx)],
[name, ?]))>).lhs);		[name, ?]))>).lhs);
}		}

class ImageDimPattern<AMDGPUImageDimIntrinsic I,		class ImageDimPattern<AMDGPUImageDimIntrinsic I,
string dop, ValueType dty,		string dop, ValueType dty, bit d16,
string suffix = ""> : GCNPat<(undef), (undef)> {		string suffix = ""> : GCNPat<(undef), (undef)> {
list<AMDGPUArg> AddrArgs = I.P.AddrDefaultArgs;		list<AMDGPUArg> AddrArgs = I.P.AddrDefaultArgs;
getDwordsType AddrDwords = getDwordsType<!size(AddrArgs)>;		getDwordsType AddrDwords = getDwordsType<!size(AddrArgs)>;

Instruction MI =		MIMG MI =
!cast<Instruction>(!strconcat("IMAGE_", I.P.OpMod, dop, AddrDwords.suffix, suffix));		!cast<MIMG>(!strconcat("IMAGE_", I.P.OpMod, dop, AddrDwords.suffix, suffix));

// DAG fragment to match data arguments (vdata for store/atomic, dmask		// DAG fragment to match data arguments (vdata for store/atomic, dmask
// for non-atomic).		// for non-atomic).
dag MatchDataDag =		dag MatchDataDag =
!con(!dag(I, !foreach(arg, I.P.DataArgs, dty),		!con(!dag(I, !foreach(arg, I.P.DataArgs, dty),
!foreach(arg, I.P.DataArgs, arg.Name)),		!foreach(arg, I.P.DataArgs, arg.Name)),
!if(I.P.IsAtomic, (I), (I i32:$dmask)));		!if(I.P.IsAtomic, (I), (I i32:$dmask)));

Show All 39 Lines	!con(GenDataDag,
!if(I.P.IsSample, (MI $sampler), (MI)),		!if(I.P.IsSample, (MI $sampler), (MI)),
GenDmask,		GenDmask,
!if(I.P.IsSample, (MI (as_i1imm $unorm)), (MI 1)),		!if(I.P.IsSample, (MI (as_i1imm $unorm)), (MI 1)),
GenGLC,		GenGLC,
(MI (bitextract_imm<1> $cachepolicy),		(MI (bitextract_imm<1> $cachepolicy),
0, /* r128 */		0, /* r128 */
0, /* tfe */		0, /* tfe */
0 /(as_i1imm $lwe)/,		0 /(as_i1imm $lwe)/,
{ I.P.Dim.DA }));		{ I.P.Dim.DA }),
		!if(MI.HasD16, (MI d16), (MI)));
let ResultInstrs = [		let ResultInstrs = [
!if(IsCmpSwap, (EXTRACT_SUBREG ImageInstruction, sub0), ImageInstruction)		!if(IsCmpSwap, (EXTRACT_SUBREG ImageInstruction, sub0), ImageInstruction)
];		];
}		}

foreach intr = !listconcat(AMDGPUImageDimIntrinsics,		foreach intr = !listconcat(AMDGPUImageDimIntrinsics,
AMDGPUImageDimGetResInfoIntrinsics) in {		AMDGPUImageDimGetResInfoIntrinsics) in {
def intr#_pat_v1 : ImageDimPattern<intr, "_V1", f32>;		def intr#_pat_v1 : ImageDimPattern<intr, "_V1", f32, 0>;
def intr#_pat_v2 : ImageDimPattern<intr, "_V2", v2f32>;		def intr#_pat_v2 : ImageDimPattern<intr, "_V2", v2f32, 0>;
def intr#_pat_v4 : ImageDimPattern<intr, "_V4", v4f32>;		def intr#_pat_v4 : ImageDimPattern<intr, "_V4", v4f32, 0>;
}		}

// v2f16 and v4f16 are used as data types to signal that D16 should be used.		// v2f16 and v4f16 are used as data types to signal that D16 should be used.
// However, they are not (always) legal types, and the SelectionDAG requires us		// However, they are not (always) legal types, and the SelectionDAG requires us
// to legalize them before running any patterns. So we legalize them by		// to legalize them before running any patterns. So we legalize them by
// converting to an int type of equal size and using an internal 'd16helper'		// converting to an int type of equal size and using an internal 'd16helper'
// intrinsic instead which signifies both the use of D16 and actually allows		// intrinsic instead which signifies both the use of D16 and actually allows
// this integer-based return type.		// this integer-based return type.
multiclass ImageDimD16Helper<AMDGPUImageDimIntrinsic I,		multiclass ImageDimD16Helper<AMDGPUImageDimIntrinsic I,
AMDGPUImageDimIntrinsic d16helper> {		AMDGPUImageDimIntrinsic d16helper> {
let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
def _unpacked_v1 : ImageDimPattern<I, "_V1", f16, "_D16_gfx80">;		def _unpacked_v1 : ImageDimPattern<I, "_V1", f16, 1>;
def _unpacked_v2 : ImageDimPattern<d16helper, "_V2", v2i32, "_D16_gfx80">;		def _unpacked_v2 : ImageDimPattern<d16helper, "_V2", v2i32, 1>;
def _unpacked_v4 : ImageDimPattern<d16helper, "_V4", v4i32, "_D16_gfx80">;		def _unpacked_v4 : ImageDimPattern<d16helper, "_V4", v4i32, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
def _packed_v1 : ImageDimPattern<I, "_V1", f16, "_D16">;		def _packed_v1 : ImageDimPattern<I, "_V1", f16, 1>;
def _packed_v2 : ImageDimPattern<I, "_V1", v2f16, "_D16">;		def _packed_v2 : ImageDimPattern<I, "_V1", v2f16, 1>;
def _packed_v4 : ImageDimPattern<d16helper, "_V2", v2i32, "_D16">;		def _packed_v4 : ImageDimPattern<d16helper, "_V2", v2i32, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

foreach intr = AMDGPUImageDimIntrinsics in {		foreach intr = AMDGPUImageDimIntrinsics in {
def intr#_d16helper_profile : AMDGPUDimProfileCopy<intr.P> {		def intr#_d16helper_profile : AMDGPUDimProfileCopy<intr.P> {
let RetTypes = !foreach(ty, intr.P.RetTypes, llvm_any_ty);		let RetTypes = !foreach(ty, intr.P.RetTypes, llvm_any_ty);
let DataArgs = !foreach(arg, intr.P.DataArgs, AMDGPUArg<llvm_any_ty, arg.Name>);		let DataArgs = !foreach(arg, intr.P.DataArgs, AMDGPUArg<llvm_any_ty, arg.Name>);
}		}

let TargetPrefix = "SI", isTarget = 1 in		let TargetPrefix = "SI", isTarget = 1 in
def int_SI_image_d16helper_ # intr.P.OpMod # intr.P.Dim.Name :		def int_SI_image_d16helper_ # intr.P.OpMod # intr.P.Dim.Name :
AMDGPUImageDimIntrinsic<!cast<AMDGPUDimProfile>(intr#"_d16helper_profile"),		AMDGPUImageDimIntrinsic<!cast<AMDGPUDimProfile>(intr#"_d16helper_profile"),
intr.IntrProperties, intr.Properties>;		intr.IntrProperties, intr.Properties>;

defm intr#_d16 :		defm intr#_d16 :
ImageDimD16Helper<		ImageDimD16Helper<
intr, !cast<AMDGPUImageDimIntrinsic>(		intr, !cast<AMDGPUImageDimIntrinsic>(
"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name)>;		"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name)>;
}		}

foreach intr = AMDGPUImageDimGatherIntrinsics in {		foreach intr = AMDGPUImageDimGatherIntrinsics in {
def intr#_pat3 : ImageDimPattern<intr, "_V4", v4f32>;		def intr#_pat3 : ImageDimPattern<intr, "_V4", v4f32, 0>;

def intr#_d16helper_profile : AMDGPUDimProfileCopy<intr.P> {		def intr#_d16helper_profile : AMDGPUDimProfileCopy<intr.P> {
let RetTypes = !foreach(ty, intr.P.RetTypes, llvm_any_ty);		let RetTypes = !foreach(ty, intr.P.RetTypes, llvm_any_ty);
let DataArgs = !foreach(arg, intr.P.DataArgs, AMDGPUArg<llvm_any_ty, arg.Name>);		let DataArgs = !foreach(arg, intr.P.DataArgs, AMDGPUArg<llvm_any_ty, arg.Name>);
}		}

let TargetPrefix = "SI", isTarget = 1 in		let TargetPrefix = "SI", isTarget = 1 in
def int_SI_image_d16helper_ # intr.P.OpMod # intr.P.Dim.Name :		def int_SI_image_d16helper_ # intr.P.OpMod # intr.P.Dim.Name :
AMDGPUImageDimIntrinsic<!cast<AMDGPUDimProfile>(intr#"_d16helper_profile"),		AMDGPUImageDimIntrinsic<!cast<AMDGPUDimProfile>(intr#"_d16helper_profile"),
intr.IntrProperties, intr.Properties>;		intr.IntrProperties, intr.Properties>;

let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
def intr#_unpacked_v4 :		def intr#_unpacked_v4 :
ImageDimPattern<!cast<AMDGPUImageDimIntrinsic>(		ImageDimPattern<!cast<AMDGPUImageDimIntrinsic>(
"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name),		"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name),
"_V4", v4i32, "_D16_gfx80">;		"_V4", v4i32, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
def intr#_packed_v4 :		def intr#_packed_v4 :
ImageDimPattern<!cast<AMDGPUImageDimIntrinsic>(		ImageDimPattern<!cast<AMDGPUImageDimIntrinsic>(
"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name),		"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name),
"_V2", v2i32, "_D16">;		"_V2", v2i32, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

foreach intr = AMDGPUImageDimAtomicIntrinsics in {		foreach intr = AMDGPUImageDimAtomicIntrinsics in {
def intr#_pat1 : ImageDimPattern<intr, "_V1", i32>;		def intr#_pat1 : ImageDimPattern<intr, "_V1", i32, 0>;
}		}

/******** ======================= ********/		/******** ======================= ********/
/******** Image sampling patterns ********/		/******** Image sampling patterns ********/
/******** ======================= ********/		/******** ======================= ********/

// ImageSample for amdgcn		// ImageSample for amdgcn
// TODO:		// TODO:
// 1. Handle v4i32 rsrc type (Register Class for the instruction to be SReg_128).		// 1. Handle v4i32 rsrc type (Register Class for the instruction to be SReg_128).
// 2. Add A16 support when we pass address of half type.		// 2. Add A16 support when we pass address of half type.
multiclass ImageSamplePattern<SDPatternOperator name, MIMG opcode, ValueType dt, ValueType vt> {		multiclass ImageSamplePattern<SDPatternOperator name, MIMG opcode,
		ValueType dt, ValueType vt, bit d16> {
def : GCNPat<		def : GCNPat<
(dt (name vt:$addr, v8i32:$rsrc, v4i32:$sampler, i32:$dmask, i1:$unorm, i1:$glc,		(dt (name vt:$addr, v8i32:$rsrc, v4i32:$sampler, i32:$dmask, i1:$unorm, i1:$glc,
i1:$slc, i1:$lwe, i1:$da)),		i1:$slc, i1:$lwe, i1:$da)),
(opcode $addr, $rsrc, $sampler,		!con((opcode $addr, $rsrc, $sampler, (as_i32imm $dmask), (as_i1imm $unorm),
(as_i32imm $dmask), (as_i1imm $unorm), (as_i1imm $glc), (as_i1imm $slc),		(as_i1imm $glc), (as_i1imm $slc), 0, 0, (as_i1imm $lwe),
0, 0, (as_i1imm $lwe), (as_i1imm $da))		(as_i1imm $da)),
		!if(opcode.HasD16, (opcode d16), (opcode)))
>;		>;
}		}

multiclass ImageSampleDataPatterns<SDPatternOperator name, string opcode, ValueType dt, string suffix = ""> {		multiclass ImageSampleDataPatterns<SDPatternOperator name, string opcode,
defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V1 # suffix), dt, f32>;		ValueType dt, bit d16> {
defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V2 # suffix), dt, v2f32>;		defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V1), dt, f32, d16>;
defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V4 # suffix), dt, v4f32>;		defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V2), dt, v2f32, d16>;
defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V8 # suffix), dt, v8f32>;		defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V4), dt, v4f32, d16>;
defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V16 # suffix), dt, v16f32>;		defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V8), dt, v8f32, d16>;
		defm : ImageSamplePattern<name, !cast<MIMG>(opcode # _V16), dt, v16f32, d16>;
}		}

// ImageSample patterns.		// ImageSample patterns.
multiclass ImageSamplePatterns<SDPatternOperator name, string opcode> {		multiclass ImageSamplePatterns<SDPatternOperator name, string opcode> {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f32>;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f32, 0>;
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2f32>;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2f32, 0>;
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4f32>;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4f32, 0>;

let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16_gfx80">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), v2f16, "_D16">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), v2f16, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageSample alternative patterns for illegal vector half Types.		// ImageSample alternative patterns for illegal vector half Types.
multiclass ImageSampleAltPatterns<SDPatternOperator name, string opcode> {		multiclass ImageSampleAltPatterns<SDPatternOperator name, string opcode> {
let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16_gfx80">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4i32, "_D16_gfx80">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4i32, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageGather4 patterns.		// ImageGather4 patterns.
multiclass ImageGather4Patterns<SDPatternOperator name, string opcode> {		multiclass ImageGather4Patterns<SDPatternOperator name, string opcode> {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4f32>;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4f32, 0>;
}		}

// ImageGather4 alternative patterns for illegal vector half Types.		// ImageGather4 alternative patterns for illegal vector half Types.
multiclass ImageGather4AltPatterns<SDPatternOperator name, string opcode> {		multiclass ImageGather4AltPatterns<SDPatternOperator name, string opcode> {
let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4i32, "_D16_gfx80">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V4), v4i32, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16">;		defm : ImageSampleDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageLoad for amdgcn.		// ImageLoad for amdgcn.
multiclass ImageLoadPattern<SDPatternOperator name, MIMG opcode, ValueType dt, ValueType vt> {		multiclass ImageLoadPattern<SDPatternOperator name, MIMG opcode,
		ValueType dt, ValueType vt, bit d16> {
def : GCNPat <		def : GCNPat <
(dt (name vt:$addr, v8i32:$rsrc, i32:$dmask, i1:$glc, i1:$slc, i1:$lwe,		(dt (name vt:$addr, v8i32:$rsrc, i32:$dmask, i1:$glc, i1:$slc, i1:$lwe,
i1:$da)),		i1:$da)),
(opcode $addr, $rsrc,		!con((opcode $addr, $rsrc, (as_i32imm $dmask), 1, (as_i1imm $glc),
(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),		(as_i1imm $slc), 0, 0, (as_i1imm $lwe), (as_i1imm $da)),
0, 0, (as_i1imm $lwe), (as_i1imm $da))		!if(opcode.HasD16, (opcode d16), (opcode)))
>;		>;
}		}

multiclass ImageLoadDataPatterns<SDPatternOperator name, string opcode, ValueType dt, string suffix = ""> {		multiclass ImageLoadDataPatterns<SDPatternOperator name, string opcode,
defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V1 # suffix), dt, i32>;		ValueType dt, bit d16> {
defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V2 # suffix), dt, v2i32>;		defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V1), dt, i32, d16>;
defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4 # suffix), dt, v4i32>;		defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V2), dt, v2i32, d16>;
		defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4), dt, v4i32, d16>;
}		}

// ImageLoad patterns.		// ImageLoad patterns.
// TODO: support v3f32.		// TODO: support v3f32.
multiclass ImageLoadPatterns<SDPatternOperator name, string opcode> {		multiclass ImageLoadPatterns<SDPatternOperator name, string opcode> {
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f32>;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f32, 0>;
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V2), v2f32>;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V2), v2f32, 0>;
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V4), v4f32>;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V4), v4f32, 0>;

let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16_gfx80">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), v2f16, "_D16">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), v2f16, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageLoad alternative patterns for illegal vector half Types.		// ImageLoad alternative patterns for illegal vector half Types.
multiclass ImageLoadAltPatterns<SDPatternOperator name, string opcode> {		multiclass ImageLoadAltPatterns<SDPatternOperator name, string opcode> {
let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16_gfx80">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V4), v4i32, "_D16_gfx80">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V4), v4i32, 1>;
} // End HasUnPackedD16VMem.		} // End HasUnPackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16">;		defm : ImageLoadDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageStore for amdgcn.		// ImageStore for amdgcn.
multiclass ImageStorePattern<SDPatternOperator name, MIMG opcode, ValueType dt, ValueType vt> {		multiclass ImageStorePattern<SDPatternOperator name, MIMG opcode,
		ValueType dt, ValueType vt, bit d16> {
def : GCNPat <		def : GCNPat <
(name dt:$data, vt:$addr, v8i32:$rsrc, i32:$dmask, i1:$glc, i1:$slc,		(name dt:$data, vt:$addr, v8i32:$rsrc, i32:$dmask, i1:$glc, i1:$slc,
i1:$lwe, i1:$da),		i1:$lwe, i1:$da),
(opcode $data, $addr, $rsrc,		!con((opcode $data, $addr, $rsrc, (as_i32imm $dmask), 1, (as_i1imm $glc),
(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),		(as_i1imm $slc), 0, 0, (as_i1imm $lwe), (as_i1imm $da)),
0, 0, (as_i1imm $lwe), (as_i1imm $da))		!if(opcode.HasD16, (opcode d16), (opcode)))
>;		>;
}		}

multiclass ImageStoreDataPatterns<SDPatternOperator name, string opcode, ValueType dt, string suffix = ""> {		multiclass ImageStoreDataPatterns<SDPatternOperator name, string opcode,
defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V1 # suffix), dt, i32>;		ValueType dt, bit d16> {
defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V2 # suffix), dt, v2i32>;		defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V1), dt, i32, d16>;
defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V4 # suffix), dt, v4i32>;		defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V2), dt, v2i32, d16>;
		defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V4), dt, v4i32, d16>;
}		}

// ImageStore patterns.		// ImageStore patterns.
// TODO: support v3f32.		// TODO: support v3f32.
multiclass ImageStorePatterns<SDPatternOperator name, string opcode> {		multiclass ImageStorePatterns<SDPatternOperator name, string opcode> {
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), f32>;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), f32, 0>;
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V2), v2f32>;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V2), v2f32, 0>;
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V4), v4f32>;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V4), v4f32, 0>;

let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16_gfx80">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), f16, "_D16">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), f16, 1>;
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), v2f16, "_D16">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), v2f16, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageStore alternative patterns.		// ImageStore alternative patterns.
multiclass ImageStoreAltPatterns<SDPatternOperator name, string opcode> {		multiclass ImageStoreAltPatterns<SDPatternOperator name, string opcode> {
let SubtargetPredicate = HasUnpackedD16VMem in {		let SubtargetPredicate = HasUnpackedD16VMem in {
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16_gfx80">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V4), v4i32, "_D16_gfx80">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V4), v4i32, 1>;
} // End HasUnpackedD16VMem.		} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let SubtargetPredicate = HasPackedD16VMem in {
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), i32, "_D16">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V1), i32, 1>;
defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V2), v2i32, "_D16">;		defm : ImageStoreDataPatterns<name, !cast<string>(opcode # _V2), v2i32, 1>;
} // End HasPackedD16VMem.		} // End HasPackedD16VMem.
}		}

// ImageAtomic for amdgcn.		// ImageAtomic for amdgcn.
class ImageAtomicPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : GCNPat <		class ImageAtomicPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : GCNPat <
(name i32:$vdata, vt:$addr, v8i32:$rsrc, imm:$r128, imm:$da, imm:$slc),		(name i32:$vdata, vt:$addr, v8i32:$rsrc, imm:$r128, imm:$da, imm:$slc),
(opcode $vdata, $addr, $rsrc, 1, 1, 1, (as_i1imm $slc), (as_i1imm $r128), 0, 0, (as_i1imm $da))		(opcode $vdata, $addr, $rsrc, 1, 1, 1, (as_i1imm $slc), (as_i1imm $r128), 0, 0, (as_i1imm $da))
>;		>;
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umin, "IMAGE_ATOMIC_UMIN">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umin, "IMAGE_ATOMIC_UMIN">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_smax, "IMAGE_ATOMIC_SMAX">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_smax, "IMAGE_ATOMIC_SMAX">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umax, "IMAGE_ATOMIC_UMAX">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umax, "IMAGE_ATOMIC_UMAX">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_and, "IMAGE_ATOMIC_AND">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_and, "IMAGE_ATOMIC_AND">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_or, "IMAGE_ATOMIC_OR">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_or, "IMAGE_ATOMIC_OR">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_xor, "IMAGE_ATOMIC_XOR">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_xor, "IMAGE_ATOMIC_XOR">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_inc, "IMAGE_ATOMIC_INC">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_inc, "IMAGE_ATOMIC_INC">;
defm : ImageAtomicPatterns<int_amdgcn_image_atomic_dec, "IMAGE_ATOMIC_DEC">;		defm : ImageAtomicPatterns<int_amdgcn_image_atomic_dec, "IMAGE_ATOMIC_DEC">;

/* SIsample for simple 1D texture lookup */
def : GCNPat <
(SIsample i32:$addr, v8i32:$rsrc, v4i32:$sampler, imm),
(IMAGE_SAMPLE_V4_V1 $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)
>;

class SamplePattern<SDNode name, MIMG opcode, ValueType vt> : GCNPat <
(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, imm),
(opcode $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)
>;

class SampleRectPattern<SDNode name, MIMG opcode, ValueType vt> : GCNPat <
(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, TEX_RECT),
(opcode $addr, $rsrc, $sampler, 0xf, 1, 0, 0, 0, 0, 0, 0)
>;

class SampleArrayPattern<SDNode name, MIMG opcode, ValueType vt> : GCNPat <
(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, TEX_ARRAY),
(opcode $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 1)
>;

class SampleShadowPattern<SDNode name, MIMG opcode,
ValueType vt> : GCNPat <
(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, TEX_SHADOW),
(opcode $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)
>;

class SampleShadowArrayPattern<SDNode name, MIMG opcode,
ValueType vt> : GCNPat <
(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, TEX_SHADOW_ARRAY),
(opcode $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 1)
>;

/* SIsample* for texture lookups consuming more address parameters */
multiclass SamplePatterns<MIMG sample, MIMG sample_c, MIMG sample_l,
MIMG sample_c_l, MIMG sample_b, MIMG sample_c_b,
MIMG sample_d, MIMG sample_c_d, ValueType addr_type> {
def : SamplePattern <SIsample, sample, addr_type>;
def : SampleRectPattern <SIsample, sample, addr_type>;
def : SampleArrayPattern <SIsample, sample, addr_type>;
def : SampleShadowPattern <SIsample, sample_c, addr_type>;
def : SampleShadowArrayPattern <SIsample, sample_c, addr_type>;

def : SamplePattern <SIsamplel, sample_l, addr_type>;
def : SampleArrayPattern <SIsamplel, sample_l, addr_type>;
def : SampleShadowPattern <SIsamplel, sample_c_l, addr_type>;
def : SampleShadowArrayPattern <SIsamplel, sample_c_l, addr_type>;

def : SamplePattern <SIsampleb, sample_b, addr_type>;
def : SampleArrayPattern <SIsampleb, sample_b, addr_type>;
def : SampleShadowPattern <SIsampleb, sample_c_b, addr_type>;
def : SampleShadowArrayPattern <SIsampleb, sample_c_b, addr_type>;

def : SamplePattern <SIsampled, sample_d, addr_type>;
def : SampleArrayPattern <SIsampled, sample_d, addr_type>;
def : SampleShadowPattern <SIsampled, sample_c_d, addr_type>;
def : SampleShadowArrayPattern <SIsampled, sample_c_d, addr_type>;
}

defm : SamplePatterns<IMAGE_SAMPLE_V4_V2, IMAGE_SAMPLE_C_V4_V2,
IMAGE_SAMPLE_L_V4_V2, IMAGE_SAMPLE_C_L_V4_V2,
IMAGE_SAMPLE_B_V4_V2, IMAGE_SAMPLE_C_B_V4_V2,
IMAGE_SAMPLE_D_V4_V2, IMAGE_SAMPLE_C_D_V4_V2,
v2i32>;
defm : SamplePatterns<IMAGE_SAMPLE_V4_V4, IMAGE_SAMPLE_C_V4_V4,
IMAGE_SAMPLE_L_V4_V4, IMAGE_SAMPLE_C_L_V4_V4,
IMAGE_SAMPLE_B_V4_V4, IMAGE_SAMPLE_C_B_V4_V4,
IMAGE_SAMPLE_D_V4_V4, IMAGE_SAMPLE_C_D_V4_V4,
v4i32>;
defm : SamplePatterns<IMAGE_SAMPLE_V4_V8, IMAGE_SAMPLE_C_V4_V8,
IMAGE_SAMPLE_L_V4_V8, IMAGE_SAMPLE_C_L_V4_V8,
IMAGE_SAMPLE_B_V4_V8, IMAGE_SAMPLE_C_B_V4_V8,
IMAGE_SAMPLE_D_V4_V8, IMAGE_SAMPLE_C_D_V4_V8,
v8i32>;
defm : SamplePatterns<IMAGE_SAMPLE_V4_V16, IMAGE_SAMPLE_C_V4_V16,
IMAGE_SAMPLE_L_V4_V16, IMAGE_SAMPLE_C_L_V4_V16,
IMAGE_SAMPLE_B_V4_V16, IMAGE_SAMPLE_C_B_V4_V16,
IMAGE_SAMPLE_D_V4_V16, IMAGE_SAMPLE_C_D_V4_V16,
v16i32>;

lib/Target/AMDGPU/SIDefines.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	// TODO: Should this be spilt into VOP3 a and b?

// Clamps hi component of register.		// Clamps hi component of register.
// ClampLo and ClampHi set for packed clamp.		// ClampLo and ClampHi set for packed clamp.
ClampHi = UINT64_C(1) << 48,		ClampHi = UINT64_C(1) << 48,

// Is a packed VOP3P instruction.		// Is a packed VOP3P instruction.
IsPacked = UINT64_C(1) << 49,		IsPacked = UINT64_C(1) << 49,

// "d16" bit set or not.		// Is a D16 buffer instruction.
D16 = UINT64_C(1) << 50		D16Buf = UINT64_C(1) << 50
};		};

// v_cmp_class_* etc. use a 10-bit mask for what operation is checked.		// v_cmp_class_* etc. use a 10-bit mask for what operation is checked.
// The result is true if any of these tests are true.		// The result is true if any of these tests are true.
enum ClassFlags {		enum ClassFlags {
S_NAN = 1 << 0, // Signaling NaN		S_NAN = 1 << 0, // Signaling NaN
Q_NAN = 1 << 1, // Quiet NaN		Q_NAN = 1 << 1, // Quiet NaN
N_INFINITY = 1 << 2, // Negative infinity		N_INFINITY = 1 << 2, // Negative infinity
▲ Show 20 Lines • Show All 440 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,238 Lines • ▼ Show 20 Lines	static unsigned SubIdx2Lane(unsigned Idx) {
case AMDGPU::sub2: return 2;		case AMDGPU::sub2: return 2;
case AMDGPU::sub3: return 3;		case AMDGPU::sub3: return 3;
}		}
}		}

/// Adjust the writemask of MIMG instructions		/// Adjust the writemask of MIMG instructions
SDNode SITargetLowering::adjustWritemask(MachineSDNode &Node,		SDNode SITargetLowering::adjustWritemask(MachineSDNode &Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
		unsigned Opcode = Node->getMachineOpcode();

		// Subtract 1 because the vdata output is not a MachineSDNode operand.
		int D16Idx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::d16) - 1;
		if (D16Idx >= 0 && Node->getConstantOperandVal(D16Idx))
		return Node; // not implemented for D16

SDNode *Users[4] = { nullptr };		SDNode *Users[4] = { nullptr };
unsigned Lane = 0;		unsigned Lane = 0;
unsigned DmaskIdx = (Node->getNumOperands() - Node->getNumValues() == 9) ? 2 : 3;		unsigned DmaskIdx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::dmask) - 1;
unsigned OldDmask = Node->getConstantOperandVal(DmaskIdx);		unsigned OldDmask = Node->getConstantOperandVal(DmaskIdx);
unsigned NewDmask = 0;		unsigned NewDmask = 0;
bool HasChain = Node->getNumValues() > 1;		bool HasChain = Node->getNumValues() > 1;

if (OldDmask == 0) {		if (OldDmask == 0) {
// These are folded out, but on the chance it happens don't assert.		// These are folded out, but on the chance it happens don't assert.
return Node;		return Node;
}		}
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
/// Fold the instructions after selecting them.		/// Fold the instructions after selecting them.
/// Returns null if users were already updated.		/// Returns null if users were already updated.
SDNode SITargetLowering::PostISelFolding(MachineSDNode Node,		SDNode SITargetLowering::PostISelFolding(MachineSDNode Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const SIInstrInfo *TII = getSubtarget()->getInstrInfo();		const SIInstrInfo *TII = getSubtarget()->getInstrInfo();
unsigned Opcode = Node->getMachineOpcode();		unsigned Opcode = Node->getMachineOpcode();

if (TII->isMIMG(Opcode) && !TII->get(Opcode).mayStore() &&		if (TII->isMIMG(Opcode) && !TII->get(Opcode).mayStore() &&
!TII->isGather4(Opcode) && !TII->isD16(Opcode)) {		!TII->isGather4(Opcode)) {
return adjustWritemask(Node, DAG);		return adjustWritemask(Node, DAG);
}		}

if (Opcode == AMDGPU::INSERT_SUBREG \|\|		if (Opcode == AMDGPU::INSERT_SUBREG \|\|
Opcode == AMDGPU::REG_SEQUENCE) {		Opcode == AMDGPU::REG_SEQUENCE) {
legalizeTargetIndependentNode(Node, DAG);		legalizeTargetIndependentNode(Node, DAG);
return Node;		return Node;
}		}
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrFormats.td

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	class InstSI <dag outs, dag ins, string asm = "",

// This field indicates that the clamp applies to the high component		// This field indicates that the clamp applies to the high component
// of a packed output register.		// of a packed output register.
field bit ClampHi = 0;		field bit ClampHi = 0;

// This bit indicates that this is a packed VOP3P instruction		// This bit indicates that this is a packed VOP3P instruction
field bit IsPacked = 0;		field bit IsPacked = 0;

// This bit indicates that this is a D16 instruction.		// This bit indicates that this is a D16 buffer instruction.
field bit D16 = 0;		field bit D16Buf = 0;

// These need to be kept in sync with the enum in SIInstrFlags.		// These need to be kept in sync with the enum in SIInstrFlags.
let TSFlags{0} = SALU;		let TSFlags{0} = SALU;
let TSFlags{1} = VALU;		let TSFlags{1} = VALU;

let TSFlags{2} = SOP1;		let TSFlags{2} = SOP1;
let TSFlags{3} = SOP2;		let TSFlags{3} = SOP2;
let TSFlags{4} = SOPC;		let TSFlags{4} = SOPC;
Show All 40 Lines	class InstSI <dag outs, dag ins, string asm = "",

let TSFlags{45} = FPClamp;		let TSFlags{45} = FPClamp;
let TSFlags{46} = IntClamp;		let TSFlags{46} = IntClamp;
let TSFlags{47} = ClampLo;		let TSFlags{47} = ClampLo;
let TSFlags{48} = ClampHi;		let TSFlags{48} = ClampHi;

let TSFlags{49} = IsPacked;		let TSFlags{49} = IsPacked;

let TSFlags{50} = D16;		let TSFlags{50} = D16Buf;

let SchedRW = [Write32Bit];		let SchedRW = [Write32Bit];

field bits<1> DisableSIDecoder = 0;		field bits<1> DisableSIDecoder = 0;
field bits<1> DisableVIDecoder = 0;		field bits<1> DisableVIDecoder = 0;
field bits<1> DisableDecoder = 0;		field bits<1> DisableDecoder = 0;

let isAsmParserOnly = !if(!eq(DisableDecoder{0}, {0}), 0, 1);		let isAsmParserOnly = !if(!eq(DisableDecoder{0}, {0}), 0, 1);
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	class MIMGe <bits<7> op> : Enc64 {
bits<4> dmask;		bits<4> dmask;
bits<1> unorm;		bits<1> unorm;
bits<1> glc;		bits<1> glc;
bits<1> da;		bits<1> da;
bits<1> r128;		bits<1> r128;
bits<1> tfe;		bits<1> tfe;
bits<1> lwe;		bits<1> lwe;
bits<1> slc;		bits<1> slc;
bits<1> d16 = 0;		bit d16;
bits<8> vaddr;		bits<8> vaddr;
bits<7> srsrc;		bits<7> srsrc;
bits<7> ssamp;		bits<7> ssamp;

let Inst{11-8} = dmask;		let Inst{11-8} = dmask;
let Inst{12} = unorm;		let Inst{12} = unorm;
let Inst{13} = glc;		let Inst{13} = glc;
let Inst{14} = da;		let Inst{14} = da;
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	class MIMG <dag outs, dag ins, string asm, list<dag> pattern> :

let VM_CNT = 1;		let VM_CNT = 1;
let EXP_CNT = 1;		let EXP_CNT = 1;
let MIMG = 1;		let MIMG = 1;
let Uses = [EXEC];		let Uses = [EXEC];

let UseNamedOperandTable = 1;		let UseNamedOperandTable = 1;
let hasSideEffects = 0; // XXX ????		let hasSideEffects = 0; // XXX ????

		bit HasD16 = 0;
}		}

lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	public:
static bool isGather4(const MachineInstr &MI) {		static bool isGather4(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::Gather4;		return MI.getDesc().TSFlags & SIInstrFlags::Gather4;
}		}

bool isGather4(uint16_t Opcode) const {		bool isGather4(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::Gather4;		return get(Opcode).TSFlags & SIInstrFlags::Gather4;
}		}

static bool isD16(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::D16;
}

bool isD16(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::D16;
}

static bool isFLAT(const MachineInstr &MI) {		static bool isFLAT(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::FLAT;		return MI.getDesc().TSFlags & SIInstrFlags::FLAT;
}		}

// Is a FLAT encoded instruction which accesses a specific segment,		// Is a FLAT encoded instruction which accesses a specific segment,
// i.e. global_* or scratch_*.		// i.e. global_* or scratch_*.
static bool isSegmentSpecificFLAT(const MachineInstr &MI) {		static bool isSegmentSpecificFLAT(const MachineInstr &MI) {
auto Flags = MI.getDesc().TSFlags;		auto Flags = MI.getDesc().TSFlags;
▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	// Gather4 with comparison and offsets.			// Gather4 with comparison and offsets.
	def SIImage_gather4_c_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_O">;			def SIImage_gather4_c_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_O">;
	def SIImage_gather4_c_cl_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_CL_O">;			def SIImage_gather4_c_cl_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_CL_O">;
	def SIImage_gather4_c_l_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_L_O">;			def SIImage_gather4_c_l_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_L_O">;
	def SIImage_gather4_c_b_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_B_O">;			def SIImage_gather4_c_b_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_B_O">;
	def SIImage_gather4_c_b_cl_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_B_CL_O">;			def SIImage_gather4_c_b_cl_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_B_CL_O">;
	def SIImage_gather4_c_lz_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_LZ_O">;			def SIImage_gather4_c_lz_o : SDTImage_sample<"AMDGPUISD::IMAGE_GATHER4_C_LZ_O">;

	class SDSample<string opcode> : SDNode <opcode,
	SDTypeProfile<1, 4, [SDTCisVT<0, v4f32>, SDTCisVT<2, v8i32>,
	SDTCisVT<3, v4i32>, SDTCisVT<4, i32>]>
	>;

	def SIsample : SDSample<"AMDGPUISD::SAMPLE">;
	def SIsampleb : SDSample<"AMDGPUISD::SAMPLEB">;
	def SIsampled : SDSample<"AMDGPUISD::SAMPLED">;
	def SIsamplel : SDSample<"AMDGPUISD::SAMPLEL">;

	def SIpc_add_rel_offset : SDNode<"AMDGPUISD::PC_ADD_REL_OFFSET",			def SIpc_add_rel_offset : SDNode<"AMDGPUISD::PC_ADD_REL_OFFSET",
	SDTypeProfile<1, 2, [SDTCisVT<0, iPTR>, SDTCisSameAs<0,1>, SDTCisSameAs<0,2>]>			SDTypeProfile<1, 2, [SDTCisVT<0, iPTR>, SDTCisSameAs<0,1>, SDTCisSameAs<0,2>]>
	>;			>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// ValueType helpers			// ValueType helpers
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	▲ Show 20 Lines • Show All 1,753 Lines • ▼ Show 20 Lines
	def getMIMGAtomicOp2 : InstrMapping {			def getMIMGAtomicOp2 : InstrMapping {
	let FilterClass = "MIMG_Atomic_Size";			let FilterClass = "MIMG_Atomic_Size";
	let RowFields = ["Op"];			let RowFields = ["Op"];
	let ColFields = ["AtomicSize"];			let ColFields = ["AtomicSize"];
	let KeyCol = ["2"];			let KeyCol = ["2"];
	let ValueCols = [["1"]];			let ValueCols = [["1"]];
	}			}

				def getMIMGGatherOpPackedD16 : InstrMapping {
				let FilterClass = "MIMG_Gather_Size";
				let RowFields = ["Op"];
				let ColFields = ["Channels"];
				let KeyCol = ["4"];
				let ValueCols = [["2"]];
				}

	// Maps an commuted opcode to its original version			// Maps an commuted opcode to its original version
	def getCommuteOrig : InstrMapping {			def getCommuteOrig : InstrMapping {
	let FilterClass = "Commutable_REV";			let FilterClass = "Commutable_REV";
	let RowFields = ["RevOp"];			let RowFields = ["RevOp"];
	let ColFields = ["IsOrig"];			let ColFields = ["IsOrig"];
	let KeyCol = ["0"];			let KeyCol = ["0"];
	let ValueCols = [["1"]];			let ValueCols = [["1"]];
	}			}
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	int getMaskedMIMGOp(const MCInstrInfo &MII,			int getMaskedMIMGOp(const MCInstrInfo &MII,
	unsigned Opc, unsigned NewChannels);			unsigned Opc, unsigned NewChannels);

	LLVM_READONLY			LLVM_READONLY
	int getMaskedMIMGAtomicOp(const MCInstrInfo &MII,			int getMaskedMIMGAtomicOp(const MCInstrInfo &MII,
	unsigned Opc, unsigned NewChannels);			unsigned Opc, unsigned NewChannels);

	LLVM_READONLY			LLVM_READONLY
				int getMIMGGatherOpPackedD16(uint16_t Opcode);

				LLVM_READONLY
	int getMCOpcode(uint16_t Opcode, unsigned Gen);			int getMCOpcode(uint16_t Opcode, unsigned Gen);

	void initDefaultAMDKernelCodeT(amd_kernel_code_t &Header,			void initDefaultAMDKernelCodeT(amd_kernel_code_t &Header,
	const FeatureBitset &Features);			const FeatureBitset &Features);

	bool isGroupSegment(const GlobalValue *GV);			bool isGroupSegment(const GlobalValue *GV);
	bool isGlobalSegment(const GlobalValue *GV);			bool isGlobalSegment(const GlobalValue *GV);
	bool isReadOnlySegment(const GlobalValue *GV);			bool isReadOnlySegment(const GlobalValue *GV);
	▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/coalescer-subreg-join.mir

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	bb.0:
%11.sub1 = COPY %1		%11.sub1 = COPY %1
%11.sub2 = COPY %1		%11.sub2 = COPY %1
%11.sub3 = COPY %1		%11.sub3 = COPY %1
%11.sub4 = COPY %1		%11.sub4 = COPY %1
%11.sub5 = COPY %1		%11.sub5 = COPY %1
%11.sub6 = COPY %1		%11.sub6 = COPY %1
%11.sub7 = COPY %1		%11.sub7 = COPY %1
%11.sub8 = COPY %1		%11.sub8 = COPY %1
dead %18 = IMAGE_SAMPLE_C_D_O_V1_V16 %11, %3, %4, 1, 0, 0, 0, 0, 0, 0, -1, implicit $exec		dead %18 = IMAGE_SAMPLE_C_D_O_V1_V16 %11, %3, %4, 1, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec
%20.sub1 = COPY %2		%20.sub1 = COPY %2
%20.sub2 = COPY %2		%20.sub2 = COPY %2
%20.sub3 = COPY %2		%20.sub3 = COPY %2
%20.sub4 = COPY %2		%20.sub4 = COPY %2
%20.sub5 = COPY %2		%20.sub5 = COPY %2
%20.sub6 = COPY %2		%20.sub6 = COPY %2
%20.sub7 = COPY %2		%20.sub7 = COPY %2
%20.sub8 = COPY %2		%20.sub8 = COPY %2
dead %27 = IMAGE_SAMPLE_C_D_O_V1_V16 %20, %5, %6, 1, 0, 0, 0, 0, 0, 0, -1, implicit $exec		dead %27 = IMAGE_SAMPLE_C_D_O_V1_V16 %20, %5, %6, 1, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec

...		...

test/MC/AMDGPU/mimg.s

	Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines

	image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x4			image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x4
	// GCN: image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x4 ; encoding: [0x00,0x04,0x00,0xf1,0x01,0x05,0x62,0x00]			// GCN: image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x4 ; encoding: [0x00,0x04,0x00,0xf1,0x01,0x05,0x62,0x00]

	image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x8			image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x8
	// GCN: image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x8 ; encoding: [0x00,0x08,0x00,0xf1,0x01,0x05,0x62,0x00]			// GCN: image_gather4 v[5:8], v[1:4], s[8:15], s[12:15] dmask:0x8 ; encoding: [0x00,0x08,0x00,0xf1,0x01,0x05,0x62,0x00]

	image_gather4 v[5:8], v1, s[8:15], s[12:15] dmask:0x1 d16			image_gather4 v[5:8], v1, s[8:15], s[12:15] dmask:0x1 d16
	// NOSICI: error: instruction not supported on this GPU			// NOSICI: error: d16 modifier is not supported on this GPU
	// GFX8_0: image_gather4 v[5:8], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]			// GFX8_0: image_gather4 v[5:8], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]
	// NOGFX8_1: error: instruction not supported on this GPU			// NOGFX8_1: error: image data size does not match dmask and tfe
	// NOGFX9: error: instruction not supported on this GPU			// NOGFX9: error: image data size does not match dmask and tfe

	image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16			image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16
	// NOSICI: error: d16 modifier is not supported on this GPU			// NOSICI: error: d16 modifier is not supported on this GPU
	// NOGFX8_0: error: instruction not supported on this GPU			// NOGFX8_0: error: image data size does not match dmask and tfe
	// GFX8_1: image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]			// GFX8_1: image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]
	// GFX9: image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]			// GFX9: image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]

	// FIXME: d16 is handled as an optional modifier, should it be corrected?
	image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1			image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1
	// NOSICI: error: d16 modifier is not supported on this GPU			// NOSICI: error: image data size does not match dmask and tfe
	// NOGFX8_0: error: instruction not supported on this GPU			// NOGFX8_0: error: image data size does not match dmask and tfe
	// GFX8_1: image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]			// NOGFX8_1: error: image data size does not match dmask and tfe
	// GFX9: image_gather4 v[5:6], v1, s[8:15], s[12:15] dmask:0x1 d16 ; encoding: [0x00,0x01,0x00,0xf1,0x01,0x05,0x62,0x80]			// NOGFX9: error: image data size does not match dmask and tfe

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Turn D16 for MIMG instructions into a regular operandClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 148771

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

lib/Target/AMDGPU/BUFInstructions.td

lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

lib/Target/AMDGPU/MIMGInstructions.td

lib/Target/AMDGPU/SIDefines.h

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstrFormats.td

lib/Target/AMDGPU/SIInstrInfo.h

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

test/CodeGen/AMDGPU/coalescer-subreg-join.mir

test/MC/AMDGPU/mimg.s

AMDGPU: Turn D16 for MIMG instructions into a regular operand
ClosedPublic