This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Define SGPR_NULL64 register. NFCI.
ClosedPublic

Authored by rampitec on Jun 10 2022, 12:11 PM.

Download Raw Diff

Details

Reviewers

foad
Joe_Nash

Commits

rGcb9ae9371246: [AMDGPU] Define SGPR_NULL64 register. NFCI.

Summary

On gfx10+ null register can be used as both 32 and 64 bit operand.
Define a 64 bit version of the register to use during codegen.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Jun 10 2022, 12:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2022, 12:11 PM

Herald added subscribers: kosarev, jsilvanus, hsmhsm and 10 others. · View Herald Transcript

rampitec requested review of this revision.Jun 10 2022, 12:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2022, 12:11 PM

Herald added a subscriber: wdng. · View Herald Transcript

rampitec added a parent revision: D127524: [AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR.Jun 10 2022, 12:11 PM

Harbormaster completed remote builds in B169141: Diff 436006.Jun 10 2022, 1:59 PM

rampitec added a child revision: D127542: [AMDGPU] Use null for dead sdst operand.Jun 10 2022, 2:50 PM

rampitec added a reviewer: Joe_Nash.Jun 13 2022, 12:19 PM

I guess this is OK. I'm a bit surprised that null is defined like a real physical register, but I guess it has always worked this way. And MIPS seems to do the same for their r0 register which works the same way.

This revision is now accepted and ready to land.Jun 13 2022, 12:58 PM

In D127527#3579423, @foad wrote:

I guess this is OK. I'm a bit surprised that null is defined like a real physical register, but I guess it has always worked this way. And MIPS seems to do the same for their r0 register which works the same way.

It is in fact a real HW register, although quite special. Anyway we need to fit it into an operand, it needs to be a part of actual RC, and size shall match.

This revision was landed with ongoing or failed builds.Jun 13 2022, 1:23 PM

Closed by commit rGcb9ae9371246: [AMDGPU] Define SGPR_NULL64 register. NFCI. (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rGcb9ae9371246: [AMDGPU] Define SGPR_NULL64 register. NFCI..

LGTM, but could there be more opportunities to use this?
Are there cases where we want to have value 0 in a source operand (a normal VSrc, not VOPDstS64orS32 like is used in https://reviews.llvm.org/D127542), it can't be folded, and we could use sgpr null 64? It might be interesting to have a test case for that with folding excluded.

In D127527#3579552, @Joe_Nash wrote:

LGTM, but could there be more opportunities to use this?
Are there cases where we want to have value 0 in a source operand (a normal VSrc, not VOPDstS64orS32 like is used in https://reviews.llvm.org/D127542), it can't be folded, and we could use sgpr null 64? It might be interesting to have a test case for that with folding excluded.

0 is inline literal, so usually we can use inline 0 instead.

In D127527#3579759, @rampitec wrote:

In D127527#3579552, @Joe_Nash wrote:

LGTM, but could there be more opportunities to use this?
Are there cases where we want to have value 0 in a source operand (a normal VSrc, not VOPDstS64orS32 like is used in https://reviews.llvm.org/D127542), it can't be folded, and we could use sgpr null 64? It might be interesting to have a test case for that with folding excluded.

0 is inline literal, so usually we can use inline 0 instead.

I can see one marginal case where a 64 bit add/sub is expanded and we are unable to shrink the first instruction so produce V_ADD_CO_CI_U32/V_SUB_CO_CI_U32, e.g. add or sub with an SGPR operand. Here null can be used as a carry-in.

In D127527#3579878, @rampitec wrote:

In D127527#3579759, @rampitec wrote:

In D127527#3579552, @Joe_Nash wrote:

LGTM, but could there be more opportunities to use this?
Are there cases where we want to have value 0 in a source operand (a normal VSrc, not VOPDstS64orS32 like is used in https://reviews.llvm.org/D127542), it can't be folded, and we could use sgpr null 64? It might be interesting to have a test case for that with folding excluded.

0 is inline literal, so usually we can use inline 0 instead.

I can see one marginal case where a 64 bit add/sub is expanded and we are unable to shrink the first instruction so produce V_ADD_CO_CI_U32/V_SUB_CO_CI_U32, e.g. add or sub with an SGPR operand. Here null can be used as a carry-in.

And even this case is impractical. On gfx10 we can use 2 constants, so addc with a vgpr and sgpr will fall down to be shrunk and use vcc as a carry-in, and an operation with 2 sgprs will be SALU.

In D127527#3579963, @rampitec wrote:

In D127527#3579878, @rampitec wrote:

In D127527#3579759, @rampitec wrote:

In D127527#3579552, @Joe_Nash wrote:

LGTM, but could there be more opportunities to use this?
Are there cases where we want to have value 0 in a source operand (a normal VSrc, not VOPDstS64orS32 like is used in https://reviews.llvm.org/D127542), it can't be folded, and we could use sgpr null 64? It might be interesting to have a test case for that with folding excluded.

0 is inline literal, so usually we can use inline 0 instead.

I can see one marginal case where a 64 bit add/sub is expanded and we are unable to shrink the first instruction so produce V_ADD_CO_CI_U32/V_SUB_CO_CI_U32, e.g. add or sub with an SGPR operand. Here null can be used as a carry-in.

And even this case is impractical. On gfx10 we can use 2 constants, so addc with a vgpr and sgpr will fall down to be shrunk and use vcc as a carry-in, and an operation with 2 sgprs will be SALU.

Ok, thanks! It seems we have many ways to optimize instructions with zero operands, so the use of the null sgpr is quite specific.

In D127527#3581503, @Joe_Nash wrote:

Ok, thanks! It seems we have many ways to optimize instructions with zero operands, so the use of the null sgpr is quite specific.

Right. The problem with null being used in place of a vcc as a carry in particular that it prevents shrinking. And a for a normal vsrc we can do better with inline literals.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUResourceUsageAnalysis.cpp

1 line

SIInstrInfo.cpp

2 lines

SIRegisterInfo.cpp

3 lines

SIRegisterInfo.td

18 lines

Utils/

AMDGPUBaseInfo.cpp

7 lines

Diff 436542

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	for (const MachineInstr &MI : MBB) {
case AMDGPU::M0:		case AMDGPU::M0:
case AMDGPU::M0_LO16:		case AMDGPU::M0_LO16:
case AMDGPU::M0_HI16:		case AMDGPU::M0_HI16:
case AMDGPU::SRC_SHARED_BASE:		case AMDGPU::SRC_SHARED_BASE:
case AMDGPU::SRC_SHARED_LIMIT:		case AMDGPU::SRC_SHARED_LIMIT:
case AMDGPU::SRC_PRIVATE_BASE:		case AMDGPU::SRC_PRIVATE_BASE:
case AMDGPU::SRC_PRIVATE_LIMIT:		case AMDGPU::SRC_PRIVATE_LIMIT:
case AMDGPU::SGPR_NULL:		case AMDGPU::SGPR_NULL:
		case AMDGPU::SGPR_NULL64:
case AMDGPU::MODE:		case AMDGPU::MODE:
continue;		continue;

case AMDGPU::SRC_POPS_EXITING_WAVE_ID:		case AMDGPU::SRC_POPS_EXITING_WAVE_ID:
llvm_unreachable("src_pops_exiting_wave_id should not be used");		llvm_unreachable("src_pops_exiting_wave_id should not be used");

case AMDGPU::NoRegister:		case AMDGPU::NoRegister:
assert(MI.isDebugInstr() &&		assert(MI.isDebugInstr() &&
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,891 Lines • ▼ Show 20 Lines	bool SIInstrInfo::usesConstantBus(const MachineRegisterInfo &MRI,

if (!MO.isUse())		if (!MO.isUse())
return false;		return false;

if (MO.getReg().isVirtual())		if (MO.getReg().isVirtual())
return RI.isSGPRClass(MRI.getRegClass(MO.getReg()));		return RI.isSGPRClass(MRI.getRegClass(MO.getReg()));

// Null is free		// Null is free
if (MO.getReg() == AMDGPU::SGPR_NULL)		if (MO.getReg() == AMDGPU::SGPR_NULL \|\| MO.getReg() == AMDGPU::SGPR_NULL64)
return false;		return false;

// SGPRs use the constant bus		// SGPRs use the constant bus
if (MO.isImplicit()) {		if (MO.isImplicit()) {
return MO.getReg() == AMDGPU::M0 \|\|		return MO.getReg() == AMDGPU::M0 \|\|
MO.getReg() == AMDGPU::VCC \|\|		MO.getReg() == AMDGPU::VCC \|\|
MO.getReg() == AMDGPU::VCC_LO;		MO.getReg() == AMDGPU::VCC_LO;
} else {		} else {
▲ Show 20 Lines • Show All 4,549 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 585 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
reserveRegisterTuples(Reserved, AMDGPU::TTMP4_TTMP5);		reserveRegisterTuples(Reserved, AMDGPU::TTMP4_TTMP5);
reserveRegisterTuples(Reserved, AMDGPU::TTMP6_TTMP7);		reserveRegisterTuples(Reserved, AMDGPU::TTMP6_TTMP7);
reserveRegisterTuples(Reserved, AMDGPU::TTMP8_TTMP9);		reserveRegisterTuples(Reserved, AMDGPU::TTMP8_TTMP9);
reserveRegisterTuples(Reserved, AMDGPU::TTMP10_TTMP11);		reserveRegisterTuples(Reserved, AMDGPU::TTMP10_TTMP11);
reserveRegisterTuples(Reserved, AMDGPU::TTMP12_TTMP13);		reserveRegisterTuples(Reserved, AMDGPU::TTMP12_TTMP13);
reserveRegisterTuples(Reserved, AMDGPU::TTMP14_TTMP15);		reserveRegisterTuples(Reserved, AMDGPU::TTMP14_TTMP15);

// Reserve null register - it shall never be allocated		// Reserve null register - it shall never be allocated
reserveRegisterTuples(Reserved, AMDGPU::SGPR_NULL);		reserveRegisterTuples(Reserved, AMDGPU::SGPR_NULL64);

// Disallow vcc_hi allocation in wave32. It may be allocated but most likely		// Disallow vcc_hi allocation in wave32. It may be allocated but most likely
// will result in bugs.		// will result in bugs.
if (isWave32) {		if (isWave32) {
Reserved.set(AMDGPU::VCC);		Reserved.set(AMDGPU::VCC);
Reserved.set(AMDGPU::VCC_HI);		Reserved.set(AMDGPU::VCC_HI);
}		}

▲ Show 20 Lines • Show All 2,455 Lines • ▼ Show 20 Lines	if (isVectorSuperClass(RC))
return getAlignedVectorSuperClassForBitWidth(Size);		return getAlignedVectorSuperClassForBitWidth(Size);

return RC;		return RC;
}		}

bool SIRegisterInfo::isConstantPhysReg(MCRegister PhysReg) const {		bool SIRegisterInfo::isConstantPhysReg(MCRegister PhysReg) const {
switch (PhysReg) {		switch (PhysReg) {
case AMDGPU::SGPR_NULL:		case AMDGPU::SGPR_NULL:
		case AMDGPU::SGPR_NULL64:
case AMDGPU::SRC_SHARED_BASE:		case AMDGPU::SRC_SHARED_BASE:
case AMDGPU::SRC_PRIVATE_BASE:		case AMDGPU::SRC_PRIVATE_BASE:
case AMDGPU::SRC_SHARED_LIMIT:		case AMDGPU::SRC_SHARED_LIMIT:
case AMDGPU::SRC_PRIVATE_LIMIT:		case AMDGPU::SRC_PRIVATE_LIMIT:
return true;		return true;
default:		default:
return false;		return false;
}		}
Show All 18 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
// See also Utils/AMDGPUBaseInfo.cpp MAP_REG2REG.		// See also Utils/AMDGPUBaseInfo.cpp MAP_REG2REG.
defm M0_gfxpre11 : SIRegLoHi16 <"m0", 124>;		defm M0_gfxpre11 : SIRegLoHi16 <"m0", 124>;
defm M0_gfx11plus : SIRegLoHi16 <"m0", 125>;		defm M0_gfx11plus : SIRegLoHi16 <"m0", 125>;
defm M0 : SIRegLoHi16 <"m0", 0>;		defm M0 : SIRegLoHi16 <"m0", 0>;

defm SGPR_NULL_gfxpre11 : SIRegLoHi16 <"null", 125>;		defm SGPR_NULL_gfxpre11 : SIRegLoHi16 <"null", 125>;
defm SGPR_NULL_gfx11plus : SIRegLoHi16 <"null", 124>;		defm SGPR_NULL_gfx11plus : SIRegLoHi16 <"null", 124>;
defm SGPR_NULL : SIRegLoHi16 <"null", 0>;		defm SGPR_NULL : SIRegLoHi16 <"null", 0>;
		defm SGPR_NULL_HI : SIRegLoHi16 <"", 0>;

		def SGPR_NULL64 :
		RegisterWithSubRegs<"null", [SGPR_NULL, SGPR_NULL_HI]> {
		let Namespace = "AMDGPU";
		let SubRegIndices = [sub0, sub1];
		let HWEncoding = SGPR_NULL.HWEncoding;
		}

defm SRC_SHARED_BASE : SIRegLoHi16<"src_shared_base", 235>;		defm SRC_SHARED_BASE : SIRegLoHi16<"src_shared_base", 235>;
defm SRC_SHARED_LIMIT : SIRegLoHi16<"src_shared_limit", 236>;		defm SRC_SHARED_LIMIT : SIRegLoHi16<"src_shared_limit", 236>;
defm SRC_PRIVATE_BASE : SIRegLoHi16<"src_private_base", 237>;		defm SRC_PRIVATE_BASE : SIRegLoHi16<"src_private_base", 237>;
defm SRC_PRIVATE_LIMIT : SIRegLoHi16<"src_private_limit", 238>;		defm SRC_PRIVATE_LIMIT : SIRegLoHi16<"src_private_limit", 238>;
defm SRC_POPS_EXITING_WAVE_ID : SIRegLoHi16<"src_pops_exiting_wave_id", 239>;		defm SRC_POPS_EXITING_WAVE_ID : SIRegLoHi16<"src_pops_exiting_wave_id", 239>;

// Not addressable		// Not addressable
▲ Show 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	def LDS_DIRECT_CLASS : RegisterClass<"AMDGPU", [i32], 32,
let CopyCost = -1;		let CopyCost = -1;
}		}

let GeneratePressureSet = 0, HasSGPR = 1 in {		let GeneratePressureSet = 0, HasSGPR = 1 in {
// Subset of SReg_32 without M0 for SMRD instructions and alike.		// Subset of SReg_32 without M0 for SMRD instructions and alike.
// See comments in SIInstructions.td for more info.		// See comments in SIInstructions.td for more info.
def SReg_32_XM0_XEXEC : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16, i1], 32,		def SReg_32_XM0_XEXEC : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16, i1], 32,
(add SGPR_32, VCC_LO, VCC_HI, FLAT_SCR_LO, FLAT_SCR_HI, XNACK_MASK_LO, XNACK_MASK_HI,		(add SGPR_32, VCC_LO, VCC_HI, FLAT_SCR_LO, FLAT_SCR_HI, XNACK_MASK_LO, XNACK_MASK_HI,
SGPR_NULL, TTMP_32, TMA_LO, TMA_HI, TBA_LO, TBA_HI, SRC_SHARED_BASE, SRC_SHARED_LIMIT,		SGPR_NULL, SGPR_NULL_HI, TTMP_32, TMA_LO, TMA_HI, TBA_LO, TBA_HI, SRC_SHARED_BASE,
SRC_PRIVATE_BASE, SRC_PRIVATE_LIMIT, SRC_POPS_EXITING_WAVE_ID,		SRC_SHARED_LIMIT, SRC_PRIVATE_BASE, SRC_PRIVATE_LIMIT, SRC_POPS_EXITING_WAVE_ID,
SRC_VCCZ, SRC_EXECZ, SRC_SCC)> {		SRC_VCCZ, SRC_EXECZ, SRC_SCC)> {
let AllocationPriority = 10;		let AllocationPriority = 10;
}		}

def SReg_LO16_XM0_XEXEC : SIRegisterClass<"AMDGPU", [i16, f16], 16,		def SReg_LO16_XM0_XEXEC : SIRegisterClass<"AMDGPU", [i16, f16], 16,
(add SGPR_LO16, VCC_LO_LO16, VCC_HI_LO16, FLAT_SCR_LO_LO16, FLAT_SCR_HI_LO16,		(add SGPR_LO16, VCC_LO_LO16, VCC_HI_LO16, FLAT_SCR_LO_LO16, FLAT_SCR_HI_LO16,
XNACK_MASK_LO_LO16, XNACK_MASK_HI_LO16, SGPR_NULL_LO16, TTMP_LO16, TMA_LO_LO16,		XNACK_MASK_LO_LO16, XNACK_MASK_HI_LO16, SGPR_NULL_LO16, SGPR_NULL_HI_LO16, TTMP_LO16,
TMA_HI_LO16, TBA_LO_LO16, TBA_HI_LO16, SRC_SHARED_BASE_LO16,		TMA_LO_LO16, TMA_HI_LO16, TBA_LO_LO16, TBA_HI_LO16, SRC_SHARED_BASE_LO16,
SRC_SHARED_LIMIT_LO16, SRC_PRIVATE_BASE_LO16, SRC_PRIVATE_LIMIT_LO16,		SRC_SHARED_LIMIT_LO16, SRC_PRIVATE_BASE_LO16, SRC_PRIVATE_LIMIT_LO16,
SRC_POPS_EXITING_WAVE_ID_LO16, SRC_VCCZ_LO16, SRC_EXECZ_LO16, SRC_SCC_LO16)> {		SRC_POPS_EXITING_WAVE_ID_LO16, SRC_VCCZ_LO16, SRC_EXECZ_LO16, SRC_SCC_LO16)> {
let Size = 16;		let Size = 16;
let AllocationPriority = 10;		let AllocationPriority = 10;
}		}

def SReg_32_XEXEC_HI : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16, i1], 32,		def SReg_32_XEXEC_HI : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16, i1], 32,
(add SReg_32_XM0_XEXEC, EXEC_LO, M0_CLASS)> {		(add SReg_32_XM0_XEXEC, EXEC_LO, M0_CLASS)> {
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

def TTMP_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, f64, v4i16, v4f16], 32,		def TTMP_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, f64, v4i16, v4f16], 32,
(add TTMP_64Regs)> {		(add TTMP_64Regs)> {
let isAllocatable = 0;		let isAllocatable = 0;
let HasSGPR = 1;		let HasSGPR = 1;
}		}

def SReg_64_XEXEC : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, i1, v4i16, v4f16], 32,		def SReg_64_XEXEC : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, i1, v4i16, v4f16], 32,
(add SGPR_64, VCC, FLAT_SCR, XNACK_MASK, TTMP_64, TBA, TMA)> {		(add SGPR_64, VCC, FLAT_SCR, XNACK_MASK, SGPR_NULL64, TTMP_64, TBA, TMA)> {
let CopyCost = 1;		let CopyCost = 1;
let AllocationPriority = 13;		let AllocationPriority = 13;
let HasSGPR = 1;		let HasSGPR = 1;
}		}

def SReg_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, i1, v4i16, v4f16], 32,		def SReg_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, i1, v4i16, v4f16], 32,
(add SReg_64_XEXEC, EXEC)> {		(add SReg_64_XEXEC, EXEC)> {
let CopyCost = 1;		let CopyCost = 1;
▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 1,806 Lines • ▼ Show 20 Lines	#define MAP_REG2REG \
CASE_VI_GFX9PLUS(TTMP8_TTMP9_TTMP10_TTMP11) \		CASE_VI_GFX9PLUS(TTMP8_TTMP9_TTMP10_TTMP11) \
CASE_VI_GFX9PLUS(TTMP12_TTMP13_TTMP14_TTMP15) \		CASE_VI_GFX9PLUS(TTMP12_TTMP13_TTMP14_TTMP15) \
CASE_VI_GFX9PLUS(TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7) \		CASE_VI_GFX9PLUS(TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7) \
CASE_VI_GFX9PLUS(TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11) \		CASE_VI_GFX9PLUS(TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11) \
CASE_VI_GFX9PLUS(TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15) \		CASE_VI_GFX9PLUS(TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15) \
CASE_VI_GFX9PLUS(TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15) \		CASE_VI_GFX9PLUS(TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15) \
CASE_GFXPRE11_GFX11PLUS(M0) \		CASE_GFXPRE11_GFX11PLUS(M0) \
CASE_GFXPRE11_GFX11PLUS(SGPR_NULL) \		CASE_GFXPRE11_GFX11PLUS(SGPR_NULL) \
		CASE_GFXPRE11_GFX11PLUS_TO(SGPR_NULL64, SGPR_NULL) \
}		}

#define CASE_CI_VI(node) \		#define CASE_CI_VI(node) \
assert(!isSI(STI)); \		assert(!isSI(STI)); \
case node: return isCI(STI) ? node##_ci : node##_vi;		case node: return isCI(STI) ? node##_ci : node##_vi;

#define CASE_VI_GFX9PLUS(node) \		#define CASE_VI_GFX9PLUS(node) \
case node: return isGFX9Plus(STI) ? node##_gfx9plus : node##_vi;		case node: return isGFX9Plus(STI) ? node##_gfx9plus : node##_vi;

#define CASE_GFXPRE11_GFX11PLUS(node) \		#define CASE_GFXPRE11_GFX11PLUS(node) \
case node: return isGFX11Plus(STI) ? node##_gfx11plus : node##_gfxpre11;		case node: return isGFX11Plus(STI) ? node##_gfx11plus : node##_gfxpre11;

		#define CASE_GFXPRE11_GFX11PLUS_TO(node, result) \
		case node: return isGFX11Plus(STI) ? result##_gfx11plus : result##_gfxpre11;

unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI) {		unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI) {
if (STI.getTargetTriple().getArch() == Triple::r600)		if (STI.getTargetTriple().getArch() == Triple::r600)
return Reg;		return Reg;
MAP_REG2REG		MAP_REG2REG
}		}

#undef CASE_CI_VI		#undef CASE_CI_VI
#undef CASE_VI_GFX9PLUS		#undef CASE_VI_GFX9PLUS
#undef CASE_GFXPRE11_GFX11PLUS		#undef CASE_GFXPRE11_GFX11PLUS
		#undef CASE_GFXPRE11_GFX11PLUS_TO

#define CASE_CI_VI(node) case node##_ci: case node##_vi: return node;		#define CASE_CI_VI(node) case node##_ci: case node##_vi: return node;
#define CASE_VI_GFX9PLUS(node) case node##_vi: case node##_gfx9plus: return node;		#define CASE_VI_GFX9PLUS(node) case node##_vi: case node##_gfx9plus: return node;
#define CASE_GFXPRE11_GFX11PLUS(node) case node##_gfx11plus: case node##_gfxpre11: return node;		#define CASE_GFXPRE11_GFX11PLUS(node) case node##_gfx11plus: case node##_gfxpre11: return node;
		#define CASE_GFXPRE11_GFX11PLUS_TO(node, result)

unsigned mc2PseudoReg(unsigned Reg) {		unsigned mc2PseudoReg(unsigned Reg) {
MAP_REG2REG		MAP_REG2REG
}		}

#undef CASE_CI_VI		#undef CASE_CI_VI
#undef CASE_VI_GFX9PLUS		#undef CASE_VI_GFX9PLUS
#undef CASE_GFXPRE11_GFX11PLUS		#undef CASE_GFXPRE11_GFX11PLUS
		#undef CASE_GFXPRE11_GFX11PLUS_TO
#undef MAP_REG2REG		#undef MAP_REG2REG

bool isSISrcOperand(const MCInstrDesc &Desc, unsigned OpNo) {		bool isSISrcOperand(const MCInstrDesc &Desc, unsigned OpNo) {
assert(OpNo < Desc.NumOperands);		assert(OpNo < Desc.NumOperands);
unsigned OpType = Desc.OpInfo[OpNo].OperandType;		unsigned OpType = Desc.OpInfo[OpNo].OperandType;
return OpType >= AMDGPU::OPERAND_SRC_FIRST &&		return OpType >= AMDGPU::OPERAND_SRC_FIRST &&
OpType <= AMDGPU::OPERAND_SRC_LAST;		OpType <= AMDGPU::OPERAND_SRC_LAST;
}		}
▲ Show 20 Lines • Show All 492 Lines • Show Last 20 Lines