Download Raw Diff

Details

Reviewers

foad
arsenm
Joe_Nash
dstuttard

Commits

rGe58b11684331: [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11

Diff Detail

Event Timeline

mbrkusanin created this revision.Aug 31 2022, 4:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 4:46 AM

Herald added subscribers: kosarev, foad, kerbowa and 9 others. · View Herald Transcript

mbrkusanin requested review of this revision.Aug 31 2022, 4:46 AM

Herald added a subscriber: wdng. · View Herald TranscriptAug 31 2022, 4:46 AM

Following an offline discussion I've looked into adding a subtarget feature for bug for V_MAD_U64_U32 and V_MAD_I64_I32. Since VOPReal is tied with VOPPseudo I don't see a nice way of having both options for gfx11 without also having 2 real versions.
If we don't care about having an option to disable this bug on gfx11 then we can just eliminate the other Real.

mbrkusanin added reviewers: foad, arsenm, Joe_Nash, dstuttard.Aug 31 2022, 4:48 AM

Harbormaster completed remote builds in B184340: Diff 456927.Aug 31 2022, 5:23 AM

Looks reasonable to me.

llvm/lib/Target/AMDGPU/VOP3Instructions.td
923	You shouldn't need to set OtherPredicates here. Real instructions usually copy these predicates from the Pseudo instruction.

Is this still relevant? I thought there was a discussion that this wasn't actually broken?

By overriding opName (or PseudoInstr of tablegen record) for _strict pseudo we can have both pseudos select into same real instruction, so now we don't need two.
Updated use of features/predicate so now it works properly with -mattr=-mad-intra-fwd-bug which was the point of this change.

In D133012#3932090, @arsenm wrote:

Is this still relevant? I thought there was a discussion that this wasn't actually broken?

Nothing was really broken. It was just if we want a feature for this bug or not.

In D133012#3933756, @mbrkusanin wrote:

By overriding opName (or PseudoInstr of tablegen record) for _strict pseudo we can have both pseudos select into same real instruction, so now we don't need two.

This is a creative use of the opName, but I think it probably hinders readability. No other instructions are doing this, so when I look at definition of the Real instruction it seems to refer to the non-strict pseudo.

Harbormaster completed remote builds in B198195: Diff 476105.Nov 17 2022, 7:05 AM

mbrkusanin updated this revision to Diff 476122.Nov 17 2022, 7:24 AM

In D133012#3933895, @Joe_Nash wrote:

In D133012#3933756, @mbrkusanin wrote:

By overriding opName (or PseudoInstr of tablegen record) for _strict pseudo we can have both pseudos select into same real instruction, so now we don't need two.

This is a creative use of the opName, but I think it probably hinders readability. No other instructions are doing this, so when I look at definition of the Real instruction it seems to refer to the non-strict pseudo.

Well, it refers to both in a way. I changed the one it actually uses for initialization to be _strict pseudo which in turns uses same name as non strict one.

Thanks, LGTM.

llvm/lib/Target/AMDGPU/VOP3Instructions.td
300	For other types of instructions (VOPC, VOP2) opName is used as a key into mapping tables, so having duplicate entries with the same name can cause collisions. As long as there are no functional issues, I'm fine with it as is. It will be a tablegen failure if someone tries to add a mapping table and use that as the key in the future.

This revision is now accepted and ready to land.Nov 17 2022, 7:30 AM

mbrkusanin added inline comments.Nov 17 2022, 7:47 AM

llvm/lib/Target/AMDGPU/VOP3Instructions.td
300	Is current solutions preferable to having two versions for Real instruction? Problem with having two Reals is we would need to disable one for decoding because of conflict and then manually adjust it, which does not look nice. Encoding is easily resolved with just some extra predicates.

I have often wanted a standard way of mapping two different pseudos to the same real instruction.

In D133012#3934097, @foad wrote:

I have often wanted a standard way of mapping two different pseudos to the same real instruction.

I guess that is useful, but the behavior here is not always available. We are basically having multiple maps overloaded onto the same keys.
Here we are using only one map, from pseudo to real using opName (actually PseudoInstr, which is derived from opName) as the key. Other maps such as getDPPOp32 also use opName as the key. So if we always wanted to conveniently map two pseudos to one real, we would likely want to introduce several new keys that don't interfere with each other.

llvm/lib/Target/AMDGPU/VOP3Instructions.td
300	I don't quite understand what you are asking. Currently we have 2 pseudos and one real for each subtarget. We should not need multiple reals on a single subtarget. The way to resolve my comment fully is to go to the following. Which is NFC currently. defm V_MAD_U64_U32_strict : VOP3Inst <"v_mad_u64_u32_strict", VOP3b_I64_I1_I32_I32_I64>; defm V_MAD_I64_I32_strict : VOP3Inst <"v_mad_i64_i32_strict", VOP3b_I64_I1_I32_I32_I64>;

Harbormaster completed remote builds in B198208: Diff 476122.Nov 17 2022, 8:08 AM

mbrkusanin added inline comments.Nov 18 2022, 7:00 AM

llvm/lib/Target/AMDGPU/VOP3Instructions.td
300	Actually it would not be NFC. Mapping non-strict pseudo to real would fail for: llc -mcpu=gfx1100 -mattr=-mad-intra-fwd-bug. The point of the patch is to have an option to disable the workaround for the bug. Maybe I misunderstood what the first comment meant. Patch as-is matches both Pseudos into same Real. Is the potential issue for a new mapping table in the future if that table wants to use opName as key? Or are you talking about mapping Reals back into Pseudos? Would ,,let PseudoInstr = " for pseudo make a difference (opName would be different then)?

LGTM but I don't understand the name change

llvm/lib/Target/AMDGPU/VOP3Instructions.td
299–300	I don't understand the name change from _gfx11 to _strict

mbrkusanin added inline comments.Nov 18 2022, 8:59 AM

llvm/lib/Target/AMDGPU/VOP3Instructions.td
299–300	It's strange to have _gfx11 on a pseudo and then a real _gfx11_e64_gfx11. Should I restore it?

arsenm added inline comments.Nov 18 2022, 9:01 AM

llvm/lib/Target/AMDGPU/VOP3Instructions.td
299–300	That's consistent with other _gfx* behavior changing instruction variants. I do think we have a sustainability problem with all the semantic changes of the same opcodes. Over time I've started to think it would be better to codegen to concrete opcodes and swap out the instruction tables per-sub target or something

reverted name change
rebase

This revision was landed with ongoing or failed builds.Nov 18 2022, 9:21 AM

Closed by commit rGe58b11684331: [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11 (authored by mbrkusanin). · Explain Why

This revision was automatically updated to reflect the committed changes.

mbrkusanin added a commit: rGe58b11684331: [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11.

Joe_Nash added inline comments.Nov 18 2022, 9:48 AM

llvm/lib/Target/AMDGPU/VOP3Instructions.td
300	Actually it would not be NFC. Mapping non-strict pseudo to real would fail for: llc -mcpu=gfx1100 -mattr=-mad-intra-fwd-bug. The point of the patch is to have an option to disable the workaround for the bug. Ah, I missed this part where you wanted to turn off the feature on gfx11. Perhaps there should be a test for that. Maybe I misunderstood what the first comment meant. Patch as-is matches both Pseudos into same Real. Is the potential issue for a new mapping table in the future if that table wants to use opName as key? Or are you talking about mapping Reals back into Pseudos? Would ,,let PseudoInstr = " for pseudo make a difference (opName would be different then)? "a new mapping table in the future if that table wants to use opName as key" is the part I was worried about. It's OK to leave it for now and address it later if it becomes a problem.

Last I checked you can't actually turn off a subtarget feature that's attached to the cpu definition. It has some insane behavior where it concludes you didn't really mean that processor and turns off all the features

Harbormaster completed remote builds in B198493: Diff 476510.Nov 18 2022, 10:22 AM

Diff 476510

llvm/lib/Target/AMDGPU/AMDGPU.td

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines
>;		>;

def FeatureImageGather4D16Bug : SubtargetFeature<"image-gather4-d16-bug",		def FeatureImageGather4D16Bug : SubtargetFeature<"image-gather4-d16-bug",
"HasImageGather4D16Bug",		"HasImageGather4D16Bug",
"true",		"true",
"Image Gather4 D16 hardware bug"		"Image Gather4 D16 hardware bug"
>;		>;

		def FeatureMADIntraFwdBug : SubtargetFeature<"mad-intra-fwd-bug",
		"HasMADIntraFwdBug",
		"true",
		"MAD_U64/I64 intra instruction forwarding bug"
		>;

class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <		class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <
"ldsbankcount"#Value,		"ldsbankcount"#Value,
"LDSBankCount",		"LDSBankCount",
!cast<string>(Value),		!cast<string>(Value),
"The number of LDS banks per compute unit."		"The number of LDS banks per compute unit."
>;		>;

def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;		def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;
▲ Show 20 Lines • Show All 1,004 Lines • ▼ Show 20 Lines	def FeatureISAVersion11_Common : FeatureSet<
FeatureShaderCyclesRegister,		FeatureShaderCyclesRegister,
FeatureArchitectedFlatScratch,		FeatureArchitectedFlatScratch,
FeatureAtomicFaddRtnInsts,		FeatureAtomicFaddRtnInsts,
FeatureAtomicFaddNoRtnInsts,		FeatureAtomicFaddNoRtnInsts,
FeatureFlatAtomicFaddF32Inst,		FeatureFlatAtomicFaddF32Inst,
FeatureImageInsts,		FeatureImageInsts,
FeaturePackedTID,		FeaturePackedTID,
FeatureVcmpxPermlaneHazard,		FeatureVcmpxPermlaneHazard,
FeatureBackOffBarrier]>;		FeatureBackOffBarrier,
		FeatureMADIntraFwdBug]>;

def FeatureISAVersion11_0_0 : FeatureSet<		def FeatureISAVersion11_0_0 : FeatureSet<
!listconcat(FeatureISAVersion11_Common.Features,		!listconcat(FeatureISAVersion11_Common.Features,
[FeatureGFX11FullVGPRs,		[FeatureGFX11FullVGPRs,
FeatureUserSGPRInit16Bug])>;		FeatureUserSGPRInit16Bug])>;

def FeatureISAVersion11_0_1 : FeatureSet<		def FeatureISAVersion11_0_1 : FeatureSet<
!listconcat(FeatureISAVersion11_Common.Features,		!listconcat(FeatureISAVersion11_Common.Features,
▲ Show 20 Lines • Show All 466 Lines • ▼ Show 20 Lines

def EnableFlatScratch : Predicate<"Subtarget->enableFlatScratch()">;		def EnableFlatScratch : Predicate<"Subtarget->enableFlatScratch()">;

def DisableFlatScratch : Predicate<"!Subtarget->enableFlatScratch()">;		def DisableFlatScratch : Predicate<"!Subtarget->enableFlatScratch()">;

def HasUnalignedAccessMode : Predicate<"Subtarget->hasUnalignedAccessMode()">,		def HasUnalignedAccessMode : Predicate<"Subtarget->hasUnalignedAccessMode()">,
AssemblerPredicate<(all_of FeatureUnalignedAccessMode)>;		AssemblerPredicate<(all_of FeatureUnalignedAccessMode)>;

		def HasMADIntraFwdBug : Predicate<"Subtarget->hasMADIntraFwdBug()">;

		def HasNotMADIntraFwdBug : Predicate<"!Subtarget->hasMADIntraFwdBug()">;

// Include AMDGPU TD files		// Include AMDGPU TD files
include "SISchedule.td"		include "SISchedule.td"
include "GCNProcessors.td"		include "GCNProcessors.td"
include "AMDGPUInstrInfo.td"		include "AMDGPUInstrInfo.td"
include "SIRegisterInfo.td"		include "SIRegisterInfo.td"
include "AMDGPURegisterBanks.td"		include "AMDGPURegisterBanks.td"
include "AMDGPUInstructions.td"		include "AMDGPUInstructions.td"
include "SIInstrInfo.td"		include "SIInstrInfo.td"
include "AMDGPUCallingConv.td"		include "AMDGPUCallingConv.td"
include "AMDGPUSearchableTables.td"		include "AMDGPUSearchableTables.td"

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

	Show First 20 Lines • Show All 1,002 Lines • ▼ Show 20 Lines
	}			}

	// We need to handle this here because tablegen doesn't support matching			// We need to handle this here because tablegen doesn't support matching
	// instructions with multiple outputs.			// instructions with multiple outputs.
	void AMDGPUDAGToDAGISel::SelectMAD_64_32(SDNode *N) {			void AMDGPUDAGToDAGISel::SelectMAD_64_32(SDNode *N) {
	SDLoc SL(N);			SDLoc SL(N);
	bool Signed = N->getOpcode() == AMDGPUISD::MAD_I64_I32;			bool Signed = N->getOpcode() == AMDGPUISD::MAD_I64_I32;
	unsigned Opc;			unsigned Opc;
	if (Subtarget->getGeneration() == AMDGPUSubtarget::GFX11)			if (Subtarget->hasMADIntraFwdBug())
	Opc = Signed ? AMDGPU::V_MAD_I64_I32_gfx11_e64			Opc = Signed ? AMDGPU::V_MAD_I64_I32_gfx11_e64
	: AMDGPU::V_MAD_U64_U32_gfx11_e64;			: AMDGPU::V_MAD_U64_U32_gfx11_e64;
	else			else
	Opc = Signed ? AMDGPU::V_MAD_I64_I32_e64 : AMDGPU::V_MAD_U64_U32_e64;			Opc = Signed ? AMDGPU::V_MAD_I64_I32_e64 : AMDGPU::V_MAD_U64_U32_e64;

	SDValue Clamp = CurDAG->getTargetConstant(0, SL, MVT::i1);			SDValue Clamp = CurDAG->getTargetConstant(0, SL, MVT::i1);
	SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),			SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
	Clamp };			Clamp };
	CurDAG->SelectNodeTo(N, Opc, N->getVTList(), Ops);			CurDAG->SelectNodeTo(N, Opc, N->getVTList(), Ops);
	}			}

	// We need to handle this here because tablegen doesn't support matching			// We need to handle this here because tablegen doesn't support matching
	// instructions with multiple outputs.			// instructions with multiple outputs.
	void AMDGPUDAGToDAGISel::SelectMUL_LOHI(SDNode *N) {			void AMDGPUDAGToDAGISel::SelectMUL_LOHI(SDNode *N) {
	SDLoc SL(N);			SDLoc SL(N);
	bool Signed = N->getOpcode() == ISD::SMUL_LOHI;			bool Signed = N->getOpcode() == ISD::SMUL_LOHI;
	unsigned Opc;			unsigned Opc;
	if (Subtarget->getGeneration() == AMDGPUSubtarget::GFX11)			if (Subtarget->hasMADIntraFwdBug())
	Opc = Signed ? AMDGPU::V_MAD_I64_I32_gfx11_e64			Opc = Signed ? AMDGPU::V_MAD_I64_I32_gfx11_e64
	: AMDGPU::V_MAD_U64_U32_gfx11_e64;			: AMDGPU::V_MAD_U64_U32_gfx11_e64;
	else			else
	Opc = Signed ? AMDGPU::V_MAD_I64_I32_e64 : AMDGPU::V_MAD_U64_U32_e64;			Opc = Signed ? AMDGPU::V_MAD_I64_I32_e64 : AMDGPU::V_MAD_U64_U32_e64;

	SDValue Zero = CurDAG->getTargetConstant(0, SL, MVT::i64);			SDValue Zero = CurDAG->getTargetConstant(0, SL, MVT::i64);
	SDValue Clamp = CurDAG->getTargetConstant(0, SL, MVT::i1);			SDValue Clamp = CurDAG->getTargetConstant(0, SL, MVT::i1);
	SDValue Ops[] = {N->getOperand(0), N->getOperand(1), Zero, Clamp};			SDValue Ops[] = {N->getOperand(0), N->getOperand(1), Zero, Clamp};
	▲ Show 20 Lines • Show All 1,983 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

	Show First 20 Lines • Show All 459 Lines • ▼ Show 20 Lines

	bool AMDGPUInstructionSelector::selectG_AMDGPU_MAD_64_32(			bool AMDGPUInstructionSelector::selectG_AMDGPU_MAD_64_32(
	MachineInstr &I) const {			MachineInstr &I) const {
	MachineBasicBlock *BB = I.getParent();			MachineBasicBlock *BB = I.getParent();
	MachineFunction *MF = BB->getParent();			MachineFunction *MF = BB->getParent();
	const bool IsUnsigned = I.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;			const bool IsUnsigned = I.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;

	unsigned Opc;			unsigned Opc;
	if (Subtarget->getGeneration() == AMDGPUSubtarget::GFX11)			if (Subtarget->hasMADIntraFwdBug())
	Opc = IsUnsigned ? AMDGPU::V_MAD_U64_U32_gfx11_e64			Opc = IsUnsigned ? AMDGPU::V_MAD_U64_U32_gfx11_e64
	: AMDGPU::V_MAD_I64_I32_gfx11_e64;			: AMDGPU::V_MAD_I64_I32_gfx11_e64;
	else			else
	Opc = IsUnsigned ? AMDGPU::V_MAD_U64_U32_e64 : AMDGPU::V_MAD_I64_I32_e64;			Opc = IsUnsigned ? AMDGPU::V_MAD_U64_U32_e64 : AMDGPU::V_MAD_I64_I32_e64;
	I.setDesc(TII.get(Opc));			I.setDesc(TII.get(Opc));
	I.addOperand(*MF, MachineOperand::CreateImm(0));			I.addOperand(*MF, MachineOperand::CreateImm(0));
	I.addImplicitDefUseOperands(*MF);			I.addImplicitDefUseOperands(*MF);
	return constrainSelectedInstRegOperands(I, TII, TRI, RBI);			return constrainSelectedInstRegOperands(I, TII, TRI, RBI);
	▲ Show 20 Lines • Show All 4,510 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/GCNSubtarget.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	protected:
bool HasLdsBranchVmemWARHazard = false;		bool HasLdsBranchVmemWARHazard = false;
bool HasNSAtoVMEMBug = false;		bool HasNSAtoVMEMBug = false;
bool HasNSAClauseBug = false;		bool HasNSAClauseBug = false;
bool HasOffset3fBug = false;		bool HasOffset3fBug = false;
bool HasFlatSegmentOffsetBug = false;		bool HasFlatSegmentOffsetBug = false;
bool HasImageStoreD16Bug = false;		bool HasImageStoreD16Bug = false;
bool HasImageGather4D16Bug = false;		bool HasImageGather4D16Bug = false;
bool HasGFX11FullVGPRs = false;		bool HasGFX11FullVGPRs = false;
		bool HasMADIntraFwdBug = false;
bool HasVOPDInsts = false;		bool HasVOPDInsts = false;

// Dummy feature to use for assembler in tablegen.		// Dummy feature to use for assembler in tablegen.
bool FeatureDisable = false;		bool FeatureDisable = false;

SelectionDAGTargetInfo TSInfo;		SelectionDAGTargetInfo TSInfo;
private:		private:
SIInstrInfo InstrInfo;		SIInstrInfo InstrInfo;
▲ Show 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	public:
bool hasOffset3fBug() const {		bool hasOffset3fBug() const {
return HasOffset3fBug;		return HasOffset3fBug;
}		}

bool hasImageStoreD16Bug() const { return HasImageStoreD16Bug; }		bool hasImageStoreD16Bug() const { return HasImageStoreD16Bug; }

bool hasImageGather4D16Bug() const { return HasImageGather4D16Bug; }		bool hasImageGather4D16Bug() const { return HasImageGather4D16Bug; }

		bool hasMADIntraFwdBug() const { return HasMADIntraFwdBug; }

bool hasNSAEncoding() const { return HasNSAEncoding; }		bool hasNSAEncoding() const { return HasNSAEncoding; }

unsigned getNSAMaxSize() const { return NSAMaxSize; }		unsigned getNSAMaxSize() const { return NSAMaxSize; }

bool hasGFX10_AEncoding() const {		bool hasGFX10_AEncoding() const {
return GFX10_AEncoding;		return GFX10_AEncoding;
}		}

▲ Show 20 Lines • Show All 398 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOP3Instructions.td

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines

let SubtargetPredicate = isGFX7Plus in {		let SubtargetPredicate = isGFX7Plus in {
let Constraints = "@earlyclobber $vdst", SchedRW = [WriteQuarterRate32] in {		let Constraints = "@earlyclobber $vdst", SchedRW = [WriteQuarterRate32] in {
defm V_QSAD_PK_U16_U8 : VOP3Inst <"v_qsad_pk_u16_u8", VOP3_Profile<VOP_I64_I64_I32_I64, VOP3_CLAMP>>;		defm V_QSAD_PK_U16_U8 : VOP3Inst <"v_qsad_pk_u16_u8", VOP3_Profile<VOP_I64_I64_I32_I64, VOP3_CLAMP>>;
defm V_MQSAD_U32_U8 : VOP3Inst <"v_mqsad_u32_u8", VOPProfileMQSAD>;		defm V_MQSAD_U32_U8 : VOP3Inst <"v_mqsad_u32_u8", VOPProfileMQSAD>;
} // End Constraints = "@earlyclobber $vdst", SchedRW = [WriteQuarterRate32]		} // End Constraints = "@earlyclobber $vdst", SchedRW = [WriteQuarterRate32]
} // End SubtargetPredicate = isGFX7Plus		} // End SubtargetPredicate = isGFX7Plus

let isCommutable = 1 in {		let isCommutable = 1, SchedRW = [WriteIntMul, WriteSALU] in {
let SchedRW = [WriteIntMul, WriteSALU] in {		let SubtargetPredicate = isGFX7Plus, OtherPredicates = [HasNotMADIntraFwdBug] in {
let SubtargetPredicate = isGFX7GFX8GFX9GFX10 in {
defm V_MAD_U64_U32 : VOP3Inst <"v_mad_u64_u32", VOP3b_I64_I1_I32_I32_I64>;		defm V_MAD_U64_U32 : VOP3Inst <"v_mad_u64_u32", VOP3b_I64_I1_I32_I32_I64>;
defm V_MAD_I64_I32 : VOP3Inst <"v_mad_i64_i32", VOP3b_I64_I1_I32_I32_I64>;		defm V_MAD_I64_I32 : VOP3Inst <"v_mad_i64_i32", VOP3b_I64_I1_I32_I32_I64>;
}		}
let SubtargetPredicate = isGFX11Only, Constraints = "@earlyclobber $vdst" in {		let SubtargetPredicate = isGFX11Only, OtherPredicates = [HasMADIntraFwdBug],
defm V_MAD_U64_U32_gfx11 : VOP3Inst <"v_mad_u64_u32_gfx11", VOP3b_I64_I1_I32_I32_I64>;		Constraints = "@earlyclobber $vdst" in {
defm V_MAD_I64_I32_gfx11 : VOP3Inst <"v_mad_i64_i32_gfx11", VOP3b_I64_I1_I32_I32_I64>;		defm V_MAD_U64_U32_gfx11 : VOP3Inst <"v_mad_u64_u32", VOP3b_I64_I1_I32_I32_I64>;
} // End SubtargetPredicate = isGFX11Only, Constraints = "@earlyclobber $vdst"		defm V_MAD_I64_I32_gfx11 : VOP3Inst <"v_mad_i64_i32", VOP3b_I64_I1_I32_I32_I64>;
		Joe_NashUnsubmitted Not Done Reply Inline Actions For other types of instructions (VOPC, VOP2) opName is used as a key into mapping tables, so having duplicate entries with the same name can cause collisions. As long as there are no functional issues, I'm fine with it as is. It will be a tablegen failure if someone tries to add a mapping table and use that as the key in the future. Joe_Nash: For other types of instructions (VOPC, VOP2) opName is used as a key into mapping tables, so…
		mbrkusaninAuthorUnsubmitted Not Done Reply Inline Actions Is current solutions preferable to having two versions for Real instruction? Problem with having two Reals is we would need to disable one for decoding because of conflict and then manually adjust it, which does not look nice. Encoding is easily resolved with just some extra predicates. mbrkusanin: Is current solutions preferable to having two versions for Real instruction? Problem with…
		Joe_NashUnsubmitted Not Done Reply Inline Actions I don't quite understand what you are asking. Currently we have 2 pseudos and one real for each subtarget. We should not need multiple reals on a single subtarget. The way to resolve my comment fully is to go to the following. Which is NFC currently. defm V_MAD_U64_U32_strict : VOP3Inst <"v_mad_u64_u32_strict", VOP3b_I64_I1_I32_I32_I64>; defm V_MAD_I64_I32_strict : VOP3Inst <"v_mad_i64_i32_strict", VOP3b_I64_I1_I32_I32_I64>; Joe_Nash: I don't quite understand what you are asking. Currently we have 2 pseudos and one real for each…
		mbrkusaninAuthorUnsubmitted Done Reply Inline Actions Actually it would not be NFC. Mapping non-strict pseudo to real would fail for: llc -mcpu=gfx1100 -mattr=-mad-intra-fwd-bug. The point of the patch is to have an option to disable the workaround for the bug. Maybe I misunderstood what the first comment meant. Patch as-is matches both Pseudos into same Real. Is the potential issue for a new mapping table in the future if that table wants to use opName as key? Or are you talking about mapping Reals back into Pseudos? Would ,,let PseudoInstr = " for pseudo make a difference (opName would be different then)? mbrkusanin: Actually it would not be NFC. Mapping non-strict pseudo to real would fail for: llc…
		Joe_NashUnsubmitted Not Done Reply Inline Actions Actually it would not be NFC. Mapping non-strict pseudo to real would fail for: llc -mcpu=gfx1100 -mattr=-mad-intra-fwd-bug. The point of the patch is to have an option to disable the workaround for the bug. Ah, I missed this part where you wanted to turn off the feature on gfx11. Perhaps there should be a test for that. Maybe I misunderstood what the first comment meant. Patch as-is matches both Pseudos into same Real. Is the potential issue for a new mapping table in the future if that table wants to use opName as key? Or are you talking about mapping Reals back into Pseudos? Would ,,let PseudoInstr = " for pseudo make a difference (opName would be different then)? "a new mapping table in the future if that table wants to use opName as key" is the part I was worried about. It's OK to leave it for now and address it later if it becomes a problem. Joe_Nash: > Actually it would not be NFC. Mapping non-strict pseudo to real would fail for: llc…
		arsenmUnsubmitted Not Done Reply Inline Actions I don't understand the name change from _gfx11 to _strict arsenm: I don't understand the name change from _gfx11 to _strict
		mbrkusaninAuthorUnsubmitted Not Done Reply Inline Actions It's strange to have _gfx11 on a pseudo and then a real _gfx11_e64_gfx11. Should I restore it? mbrkusanin: It's strange to have _gfx11 on a pseudo and then a real _gfx11_e64_gfx11. Should I restore it?
		arsenmUnsubmitted Not Done Reply Inline Actions That's consistent with other _gfx* behavior changing instruction variants. I do think we have a sustainability problem with all the semantic changes of the same opcodes. Over time I've started to think it would be better to codegen to concrete opcodes and swap out the instruction tables per-sub target or something arsenm: That's consistent with other _gfx* behavior changing instruction variants. I do think we have…
} // End SchedRW = [WriteIntMul, WriteSALU]		}
} // End isCommutable = 1		} // End isCommutable = 1, SchedRW = [WriteIntMul, WriteSALU]


let FPDPRounding = 1 in {		let FPDPRounding = 1 in {
let Predicates = [Has16BitInsts, isGFX8Only] in {		let Predicates = [Has16BitInsts, isGFX8Only] in {
defm V_DIV_FIXUP_F16 : VOP3Inst <"v_div_fixup_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, AMDGPUdiv_fixup>;		defm V_DIV_FIXUP_F16 : VOP3Inst <"v_div_fixup_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, AMDGPUdiv_fixup>;
defm V_FMA_F16 : VOP3Inst <"v_fma_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, any_fma>;		defm V_FMA_F16 : VOP3Inst <"v_fma_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, any_fma>;
} // End Predicates = [Has16BitInsts, isGFX8Only]		} // End Predicates = [Has16BitInsts, isGFX8Only]

▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	multiclass IMAD32_Pats <VOP3_Pseudo inst> {
// FIXME: GlobalISel pattern exporter fails to export a pattern like this and asserts,		// FIXME: GlobalISel pattern exporter fails to export a pattern like this and asserts,
// make it SDAG only.		// make it SDAG only.
def : GCNPat <		def : GCNPat <
(ThreeOpFragSDAG<mul, add> i32:$src0, i32:$src1, (i32 imm:$src2)),		(ThreeOpFragSDAG<mul, add> i32:$src0, i32:$src1, (i32 imm:$src2)),
(EXTRACT_SUBREG (inst $src0, $src1, (i64 (as_i64imm $src2)), 0 /* clamp */), sub0)		(EXTRACT_SUBREG (inst $src0, $src1, (i64 (as_i64imm $src2)), 0 /* clamp */), sub0)
>;		>;
}		}

let SubtargetPredicate = isGFX9GFX10 in // exclude pre-GFX9 where it was slow		// exclude pre-GFX9 where it was slow
		let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus in
defm : IMAD32_Pats<V_MAD_U64_U32_e64>;		defm : IMAD32_Pats<V_MAD_U64_U32_e64>;
let SubtargetPredicate = isGFX11Only in		let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
defm : IMAD32_Pats<V_MAD_U64_U32_gfx11_e64>;		defm : IMAD32_Pats<V_MAD_U64_U32_gfx11_e64>;

def VOP3_PERMLANE_Profile : VOP3_Profile<VOPProfile <[i32, i32, i32, i32]>, VOP3_OPSEL> {		def VOP3_PERMLANE_Profile : VOP3_Profile<VOPProfile <[i32, i32, i32, i32]>, VOP3_OPSEL> {
let InsVOP3OpSel = (ins IntOpSelMods:$src0_modifiers, VRegSrc_32:$src0,		let InsVOP3OpSel = (ins IntOpSelMods:$src0_modifiers, VRegSrc_32:$src0,
IntOpSelMods:$src1_modifiers, SSrc_b32:$src1,		IntOpSelMods:$src1_modifiers, SSrc_b32:$src1,
IntOpSelMods:$src2_modifiers, SSrc_b32:$src2,		IntOpSelMods:$src2_modifiers, SSrc_b32:$src2,
VGPR_32:$vdst_in, op_sel0:$op_sel);		VGPR_32:$vdst_in, op_sel0:$op_sel);
let HasClamp = 0;		let HasClamp = 0;
let HasExtVOP3DPP = 0;		let HasExtVOP3DPP = 0;
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
defm V_MINMAX_U32 : VOP3_Realtriple_gfx11<0x263>;		defm V_MINMAX_U32 : VOP3_Realtriple_gfx11<0x263>;
defm V_MAXMIN_I32 : VOP3_Realtriple_gfx11<0x264>;		defm V_MAXMIN_I32 : VOP3_Realtriple_gfx11<0x264>;
defm V_MINMAX_I32 : VOP3_Realtriple_gfx11<0x265>;		defm V_MINMAX_I32 : VOP3_Realtriple_gfx11<0x265>;
defm V_DOT2_F16_F16 : VOP3Dot_Realtriple_gfx11<0x266>;		defm V_DOT2_F16_F16 : VOP3Dot_Realtriple_gfx11<0x266>;
defm V_DOT2_BF16_BF16 : VOP3Dot_Realtriple_gfx11<0x267>;		defm V_DOT2_BF16_BF16 : VOP3Dot_Realtriple_gfx11<0x267>;
defm V_DIV_SCALE_F32 : VOP3be_Real_gfx11<0x2fc, "V_DIV_SCALE_F32", "v_div_scale_f32">;		defm V_DIV_SCALE_F32 : VOP3be_Real_gfx11<0x2fc, "V_DIV_SCALE_F32", "v_div_scale_f32">;
defm V_DIV_SCALE_F64 : VOP3be_Real_gfx11<0x2fd, "V_DIV_SCALE_F64", "v_div_scale_f64">;		defm V_DIV_SCALE_F64 : VOP3be_Real_gfx11<0x2fd, "V_DIV_SCALE_F64", "v_div_scale_f64">;
defm V_MAD_U64_U32_gfx11 : VOP3be_Real_gfx11<0x2fe, "V_MAD_U64_U32_gfx11", "v_mad_u64_u32">;		defm V_MAD_U64_U32_gfx11 : VOP3be_Real_gfx11<0x2fe, "V_MAD_U64_U32_gfx11", "v_mad_u64_u32">;
defm V_MAD_I64_I32_gfx11 : VOP3be_Real_gfx11<0x2ff, "V_MAD_I64_I32_gfx11", "v_mad_i64_i32">;		defm V_MAD_I64_I32_gfx11 : VOP3be_Real_gfx11<0x2ff, "V_MAD_I64_I32_gfx11", "v_mad_i64_i32">;
		foadUnsubmitted Not Done Reply Inline Actions You shouldn't need to set OtherPredicates here. Real instructions usually copy these predicates from the Pseudo instruction. foad: You shouldn't need to set OtherPredicates here. Real instructions usually copy these predicates…
defm V_ADD_NC_U16 : VOP3Only_Realtriple_gfx11<0x303>;		defm V_ADD_NC_U16 : VOP3Only_Realtriple_gfx11<0x303>;
defm V_SUB_NC_U16 : VOP3Only_Realtriple_gfx11<0x304>;		defm V_SUB_NC_U16 : VOP3Only_Realtriple_gfx11<0x304>;
defm V_MUL_LO_U16_t16 : VOP3Only_Realtriple_t16_gfx11<0x305, "v_mul_lo_u16">;		defm V_MUL_LO_U16_t16 : VOP3Only_Realtriple_t16_gfx11<0x305, "v_mul_lo_u16">;
defm V_CVT_PK_I16_F32 : VOP3_Realtriple_gfx11<0x306>;		defm V_CVT_PK_I16_F32 : VOP3_Realtriple_gfx11<0x306>;
defm V_CVT_PK_U16_F32 : VOP3_Realtriple_gfx11<0x307>;		defm V_CVT_PK_U16_F32 : VOP3_Realtriple_gfx11<0x307>;
defm V_MAX_U16_t16 : VOP3Only_Realtriple_t16_gfx11<0x309, "v_max_u16">;		defm V_MAX_U16_t16 : VOP3Only_Realtriple_t16_gfx11<0x309, "v_max_u16">;
defm V_MAX_I16_t16 : VOP3Only_Realtriple_t16_gfx11<0x30a, "v_max_i16">;		defm V_MAX_I16_t16 : VOP3Only_Realtriple_t16_gfx11<0x30a, "v_max_i16">;
defm V_MIN_U16_t16 : VOP3Only_Realtriple_t16_gfx11<0x30b, "v_min_u16">;		defm V_MIN_U16_t16 : VOP3Only_Realtriple_t16_gfx11<0x30b, "v_min_u16">;
▲ Show 20 Lines • Show All 503 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476510

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/lib/Target/AMDGPU/VOP3Instructions.td

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476510

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/lib/Target/AMDGPU/VOP3Instructions.td

[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
ClosedPublic