This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegen
ClosedPublic

Authored by dp on Nov 18 2019, 8:07 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
vpykhtin

Commits

rG6778a62eb0d2: [AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegen

Summary

These opcodes use indirect register addressing so they need special handling by codegen (currently missing).

Diff Detail

Event Timeline

dp created this revision.Nov 18 2019, 8:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 8:07 AM

Herald added subscribers: llvm-commits, kbarton, hiraditya and 9 others. · View Herald Transcript

Look mostly good, but can you split this change into one that relates to DPP and another that disables asm only instructions?

Herald added a subscriber: • wuzish. · View Herald TranscriptNov 18 2019, 8:13 AM

dp added a parent revision: D70402: [AMDGPU][DPP] Corrected DPP combiner.Nov 18 2019, 8:35 AM

Separated dpp combiner changes to D70402

vpykhtin added inline comments.Nov 18 2019, 10:01 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6332	is there anyway to mark these instructions in td files?

dp marked 2 inline comments as done.Nov 18 2019, 10:56 AM

dp added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6332	I thought about it. Yes, it is possible, but that will not make code more readable overall. Labelling these opcodes in td will make code cleaner in this file, but require more changes elsewhere. Overall I think that this case is very special and requires a special solution. If we face similar issues in the future (that need more cases in the switch below), we may create a flag for this purpose. I'm not sure it is necessary for MOVREL*.

rampitec added inline comments.Nov 18 2019, 12:42 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6332	Is it the same as isAsmParserOnly in td? If so shouldn't it be easy to mark it there? In turn if we need an extra TSFlags bit that is not worth it, as these bits are not countless.

Actually what makes them risky is impuse of M0, so it can be folded around M0 definition. Isn't it cleaner to check for impuse in the SDWA and DPP combiner and disable the combining on these grounds rather than excluding it from codegen completely?

In D70400#1750450, @rampitec wrote:

Actually what makes them risky is impuse of M0, so it can be folded around M0 definition. Isn't it cleaner to check for impuse in the SDWA and DPP combiner and disable the combining on these grounds rather than excluding it from codegen completely?

Maybe. But I do not understand how codegen can handle these instructions without knowing actual dst and src registers. To support _dpp and _sdwa variants codegen needs the same (or similar) hacks as those implemented for v_movreld_b32.

In D70400#1750597, @dp wrote:

In D70400#1750450, @rampitec wrote:

Actually what makes them risky is impuse of M0, so it can be folded around M0 definition. Isn't it cleaner to check for impuse in the SDWA and DPP combiner and disable the combining on these grounds rather than excluding it from codegen completely?

Maybe. But I do not understand how codegen can handle these instructions without knowing actual dst and src registers. To support _dpp and _sdwa variants codegen needs the same (or similar) hacks as those implemented for v_movreld_b32.

Hmm. I think you are right:

v1 = v_and_b32 v2, 0xf
v3 = v_movrels_b32 v1

Means: v3 = v1[m0], same as v3 = (v1 & 0xf)[m0]
After sdwa conversion it would be: v3 = v2[m0] & 0xf

Not exactly the same thing.

LGTM

This revision is now accepted and ready to land.Nov 18 2019, 2:30 PM

LGTM.

Closed by commit rG6778a62eb0d2: [AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegen (authored by dp). · Explain WhyNov 20 2019, 7:08 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

GCNDPPCombine.cpp

15 lines

SIInstrInfo.h

4 lines

SIInstrInfo.cpp

23 lines

Diff 229848

llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	public:
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override { return "GCN DPP Combine"; }		StringRef getPassName() const override { return "GCN DPP Combine"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

		private:
		int getDPPOp(unsigned Op) const;
};		};

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS(GCNDPPCombine, DEBUG_TYPE, "GCN DPP Combine", false, false)		INITIALIZE_PASS(GCNDPPCombine, DEBUG_TYPE, "GCN DPP Combine", false, false)

char GCNDPPCombine::ID = 0;		char GCNDPPCombine::ID = 0;

char &llvm::GCNDPPCombineID = GCNDPPCombine::ID;		char &llvm::GCNDPPCombineID = GCNDPPCombine::ID;

FunctionPass *llvm::createGCNDPPCombinePass() {		FunctionPass *llvm::createGCNDPPCombinePass() {
return new GCNDPPCombine();		return new GCNDPPCombine();
}		}

static int getDPPOp(unsigned Op) {		int GCNDPPCombine::getDPPOp(unsigned Op) const {
auto DPP32 = AMDGPU::getDPPOp32(Op);		auto DPP32 = AMDGPU::getDPPOp32(Op);
if (DPP32 != -1)		if (DPP32 == -1) {
return DPP32;

auto E32 = AMDGPU::getVOPe32(Op);		auto E32 = AMDGPU::getVOPe32(Op);
return E32 != -1 ? AMDGPU::getDPPOp32(E32) : -1;		DPP32 = (E32 == -1)? -1 : AMDGPU::getDPPOp32(E32);
		}
		return (DPP32 == -1 \|\| TII->pseudoToMCOpcode(DPP32) == -1) ? -1 : DPP32;
}		}

// tracks the register operand definition and returns:		// tracks the register operand definition and returns:
// 1. immediate operand used to initialize the register if found		// 1. immediate operand used to initialize the register if found
// 2. nullptr if the register operand is undef		// 2. nullptr if the register operand is undef
// 3. the operand itself otherwise		// 3. the operand itself otherwise
MachineOperand *GCNDPPCombine::getOldOpndValue(MachineOperand &OldOpnd) const {		MachineOperand *GCNDPPCombine::getOldOpndValue(MachineOperand &OldOpnd) const {
auto Def = getVRegSubRegDef(getRegSubRegPair(OldOpnd), MRI);		auto Def = getVRegSubRegDef(getRegSubRegPair(OldOpnd), MRI);
▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 1,011 Lines • ▼ Show 20 Lines	public:
bool isLegalFLATOffset(int64_t Offset, unsigned AddrSpace,		bool isLegalFLATOffset(int64_t Offset, unsigned AddrSpace,
bool Signed) const;		bool Signed) const;

/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.		/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.
/// Return -1 if the target-specific opcode for the pseudo instruction does		/// Return -1 if the target-specific opcode for the pseudo instruction does
/// not exist. If Opcode is not a pseudo instruction, this is identity.		/// not exist. If Opcode is not a pseudo instruction, this is identity.
int pseudoToMCOpcode(int Opcode) const;		int pseudoToMCOpcode(int Opcode) const;

		/// \brief Check if this instruction should only be used by assembler.
		/// Return true if this opcode should not be used by codegen.
		bool isAsmOnlyOpcode(int MCOp) const;

const TargetRegisterClass *getRegClass(const MCInstrDesc &TID, unsigned OpNum,		const TargetRegisterClass *getRegClass(const MCInstrDesc &TID, unsigned OpNum,
const TargetRegisterInfo *TRI,		const TargetRegisterInfo *TRI,
const MachineFunction &MF)		const MachineFunction &MF)
const override {		const override {
if (OpNum >= TID.getNumOperands())		if (OpNum >= TID.getNumOperands())
return nullptr;		return nullptr;
return RI.getRegClass(TID.OpInfo[OpNum].RegClass);		return RI.getRegClass(TID.OpInfo[OpNum].RegClass);
}		}
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 6,323 Lines • ▼ Show 20 Lines	static SIEncodingFamily subtargetEncodingFamily(const GCNSubtarget &ST) {
case AMDGPUSubtarget::GFX9:		case AMDGPUSubtarget::GFX9:
return SIEncodingFamily::VI;		return SIEncodingFamily::VI;
case AMDGPUSubtarget::GFX10:		case AMDGPUSubtarget::GFX10:
return SIEncodingFamily::GFX10;		return SIEncodingFamily::GFX10;
}		}
llvm_unreachable("Unknown subtarget generation!");		llvm_unreachable("Unknown subtarget generation!");
}		}

		bool SIInstrInfo::isAsmOnlyOpcode(int MCOp) const {
		vpykhtinUnsubmitted Done Reply Inline Actions is there anyway to mark these instructions in td files? vpykhtin: is there anyway to mark these instructions in td files?
		dpAuthorUnsubmitted Done Reply Inline Actions I thought about it. Yes, it is possible, but that will not make code more readable overall. Labelling these opcodes in td will make code cleaner in this file, but require more changes elsewhere. Overall I think that this case is very special and requires a special solution. If we face similar issues in the future (that need more cases in the switch below), we may create a flag for this purpose. I'm not sure it is necessary for MOVREL. dp:* I thought about it. Yes, it is possible, but that will not make code more readable overall.
		rampitecUnsubmitted Not Done Reply Inline Actions Is it the same as isAsmParserOnly in td? If so shouldn't it be easy to mark it there? In turn if we need an extra TSFlags bit that is not worth it, as these bits are not countless. rampitec: Is it the same as isAsmParserOnly in td? If so shouldn't it be easy to mark it there? In turn…
		switch(MCOp) {
		// These opcodes use indirect register addressing so
		// they need special handling by codegen (currently missing).
		// Therefore it is too risky to allow these opcodes
		// to be selected by dpp combiner or sdwa peepholer.
		case AMDGPU::V_MOVRELS_B32_dpp_gfx10:
		case AMDGPU::V_MOVRELS_B32_sdwa_gfx10:
		case AMDGPU::V_MOVRELD_B32_dpp_gfx10:
		case AMDGPU::V_MOVRELD_B32_sdwa_gfx10:
		case AMDGPU::V_MOVRELSD_B32_dpp_gfx10:
		case AMDGPU::V_MOVRELSD_B32_sdwa_gfx10:
		case AMDGPU::V_MOVRELSD_2_B32_dpp_gfx10:
		case AMDGPU::V_MOVRELSD_2_B32_sdwa_gfx10:
		return true;
		default:
		return false;
		}
		}

int SIInstrInfo::pseudoToMCOpcode(int Opcode) const {		int SIInstrInfo::pseudoToMCOpcode(int Opcode) const {
SIEncodingFamily Gen = subtargetEncodingFamily(ST);		SIEncodingFamily Gen = subtargetEncodingFamily(ST);

if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&		if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&
ST.getGeneration() == AMDGPUSubtarget::GFX9)		ST.getGeneration() == AMDGPUSubtarget::GFX9)
Gen = SIEncodingFamily::GFX9;		Gen = SIEncodingFamily::GFX9;

// Adjust the encoding family to GFX80 for D16 buffer instructions when the		// Adjust the encoding family to GFX80 for D16 buffer instructions when the
Show All 22 Lines	int SIInstrInfo::pseudoToMCOpcode(int Opcode) const {
if (MCOp == -1)		if (MCOp == -1)
return Opcode;		return Opcode;

// (uint16_t)-1 means that Opcode is a pseudo instruction that has		// (uint16_t)-1 means that Opcode is a pseudo instruction that has
// no encoding in the given subtarget generation.		// no encoding in the given subtarget generation.
if (MCOp == (uint16_t)-1)		if (MCOp == (uint16_t)-1)
return -1;		return -1;

		if (isAsmOnlyOpcode(MCOp))
		return -1;

return MCOp;		return MCOp;
}		}

static		static
TargetInstrInfo::RegSubRegPair getRegOrUndef(const MachineOperand &RegOpnd) {		TargetInstrInfo::RegSubRegPair getRegOrUndef(const MachineOperand &RegOpnd) {
assert(RegOpnd.isReg());		assert(RegOpnd.isReg());
return RegOpnd.isUndef() ? TargetInstrInfo::RegSubRegPair() :		return RegOpnd.isUndef() ? TargetInstrInfo::RegSubRegPair() :
getRegSubRegPair(RegOpnd);		getRegSubRegPair(RegOpnd);
▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines