Download Raw Diff

Details

Reviewers

sebastian-ne
Pierre-vh
foad
arsenm

Commits

rG0f3e72e86c8c: AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection

Summary

When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Petar.Avramovic created this revision.Nov 15 2022, 9:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 9:55 AM

Herald added subscribers: kosarev, kerbowa, hiraditya and 5 others. · View Herald Transcript

Petar.Avramovic requested review of this revision.Nov 15 2022, 9:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 9:55 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

arsenm added inline comments.Nov 15 2022, 10:17 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3443–3447	This copy should just have been restricted in the first place. I also would have expected this to not have produced instructions unless the match happened; can you first try constraining the operands of this copy, and then splitting the predicate part from the instruction construction?

Harbormaster completed remote builds in B197792: Diff 475511.Nov 15 2022, 10:36 AM

Split getting the operand modifiers and potential copy construction.

Pierre-vh added inline comments.Nov 16 2022, 4:33 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4852	With this it looks like we will end up calling `getVOP3ModsImpl` twice. I guess this isn't called often enough (and the "get" function seems simple enough) to cause performance issues but it still feels suboptimal Could this be refactored to make it so selectVOP3ModsImpl doesn't need to call "get" again? If not then I suppose it's fine as is

Harbormaster completed remote builds in B197952: Diff 475751.Nov 16 2022, 5:06 AM

Now that I checked in tests with madmix that has sgpr input, copy is not created but it should have been.
Patch is not correct at this state.
Is there a way to move

if (!MatchedSrc0 && !MatchedSrc1 && !MatchedSrc2)
  return false;

in selectG_FMA_FMAD before lookup for modifiers begin via selectVOP3PMadMixModsImpl?

What looks equivalent to me is to check if at least one of the operands is fpext (could be hidden behind other instructions that could be folded into mods). Unfortunately this still has to call getVOP3ModsImpl on each operand at worst case before actual mods selection begins

This should be equivalent to trunk llvm but without creating copy instruction when fma_mix does not get selected.

I was checking if test like this gets selected in the same way:

define amdgpu_vs float @test_f16_f32_add_fma_ext_mul(float inreg %x, float %y, float %z, half %u, half %v) {
.entry:
    %a = fmul half %u, %v
    %b = fpext half %a to float
    %abs_x = call contract float @llvm.fabs.f32(float %x)
    %c = call float @llvm.fmuladd.f32(float %abs_x, float %y, float %b)
    %d = fadd float %c, %z
    ret float %d
}

declare float @llvm.fmuladd.f32(float, float, float)
declare float @llvm.fabs.f32(float)

selects into

%0:sreg_32 = COPY $sgpr0
%1:vgpr_32 = COPY $vgpr0
%2:vgpr_32 = COPY $vgpr1
%5:vgpr_32 = COPY $vgpr2
%6:vgpr_32 = COPY $vgpr3
%8:vgpr_32 = nofpexcept V_MUL_F16_e64 0, %5, 0, %6, 0, 0, implicit $mode, implicit $exec
%14:vgpr_32 = COPY %0
%11:vgpr_32 = V_FMA_MIX_F32 2, %14, 0, %1, 8, %8, 0, 0, 0, implicit $mode, implicit $exec
%12:vgpr_32 = nofpexcept V_ADD_F32_e64 0, %11, 0, %2, 0, 0, implicit $mode, implicit $exec
$vgpr0 = COPY %12
SI_RETURN_TO_EPILOG implicit $vgpr0

Previous version of this patch would not make a copy and it would use sgpr directly. Visible only in mir. sifoldoperands folds this copy later anyway.

%11:vgpr_32 = V_FMA_MIX_F32 2, %0, 0, %1, 8, %8, 0, 0, 0, implicit $mode, implicit $exec

Harbormaster completed remote builds in B197996: Diff 475815.Nov 16 2022, 8:04 AM

In D138044#3930822, @Petar.Avramovic wrote:
Previous version of this patch would not make a copy and it would use sgpr directly. Visible only in mir. sifoldoperands folds this copy later anyway.
%11:vgpr_32 = V_FMA_MIX_F32 2, %0, 0, %1, 8, %8, 0, 0, 0, implicit $mode, implicit $exec

I do think we should lean more on post-selection operand folding for dealing with SGPR copies. We should write a new and better version. The current version is "backwards". Traditionally after selection we would have excess SGPRs and try to remove them to legalize instructions. With RegBankSelect making every operand a VGPR, we need one written from the perspective of everything starts as VGPRs and needs to make an optimal choice for SGPR operands

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4852	I agree, you should know everything after the first round of selectVOP3PMadMixModsImpl and should just pass through what to do

I did not understand last comment. What do I need to do exactly?
Now selectVOP3PMadMixModsImpl (unchanged) is called when we know that fma/mad_mix will be selected.
Afaik select modifiers functions don't fail and are called once pattern was matched.
The tricky thing here is that selectVOP3PMadMixModsImpl was used to check for actual pattern (it needs to find fpext of some operand).
Now we check for fpext before selecting modifiers for operands.

In D138044#3931180, @Petar.Avramovic wrote:

I did not understand last comment. What do I need to do exactly?
Now selectVOP3PMadMixModsImpl (unchanged) is called when we know that fma/mad_mix will be selected.

Remove the special case getVOP3ModsImpl and FPEXT checks. Delete the copy build inside of selectVOP3ModsImpl, and move that code down to where the new instruction is constructed. You just need a new build copy to VGPR helper

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-mul.ll
173–174 ↗	(On Diff #475815)	Should pre-contract this

Remove build copy from selectVOP3ModsImpl.
Copy is now built when adding operands to selected instruction.

arsenm added inline comments.Nov 17 2022, 8:27 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3435–3437	The name isn't accurate since this isn't just blindly inserting a copy. How about copyToVGPRIfSrcFolded
3438	Unrelated, but I don't understand why this modifier check is here. It doesn't factor into the constant bus restriction
3680	Could you put this down in the callback with the addReg (same with the rest)

Petar.Avramovic added inline comments.Nov 17 2022, 8:47 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3438	I wanted to remove both checks, also build copy into a vgpr class but there were way too many changes in regression tests (mostly mir). In those tests instructions have sgpr inputs (there is no copy). Here is the summary of the effects of Mods != 0 and cloneVirtualRegister sgpr input without mods, stays sgpr no copy sgpr input with mods, creates copy but copy is still to sgpr! (cloneVirtualRegister is for Root, which was sgpr) not vgpr In default use case (to my understanding at least) all checks could be removed instructions that fold these mods have inputs with vgpr reg bank Mods != 0 is most probably fast exit since operand should have already been vgpr cloneVirtualRegister is just a convenience and avoids having to get reg class from register (needs llt, then size in bits and call to helper that will get the class)

Harbormaster completed remote builds in B198219: Diff 476135.Nov 17 2022, 8:56 AM

Name change for helper function, move call inside lambda body. Copies are now created when src operand is being added to the new instruction (MIB) and inserted before it.

Harbormaster completed remote builds in B198460: Diff 476458.Nov 18 2022, 7:53 AM

arsenm accepted this revision.Nov 18 2022, 8:44 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-mul.ll
173–174 ↗	(On Diff #475815)	Should still use fmuladd or fma intrinsic here

This revision is now accepted and ready to land.Nov 18 2022, 8:44 AM

Update test that I missed, move to test file with other fma intrinsics.

arsenm accepted this revision.Nov 18 2022, 9:00 AM

This revision was landed with ongoing or failed builds.Nov 18 2022, 9:02 AM

Closed by commit rG0f3e72e86c8c: AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection (authored by Petar.Avramovic). · Explain Why

This revision was automatically updated to reflect the committed changes.

Petar.Avramovic added a commit: rG0f3e72e86c8c: AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection.

Harbormaster completed remote builds in B198490: Diff 476500.Nov 18 2022, 9:58 AM

Pierre-vh mentioned this in D136592: [AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics.Nov 22 2022, 1:00 AM

Diff 476504

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	private:
bool selectBufferLoadLds(MachineInstr &MI) const;		bool selectBufferLoadLds(MachineInstr &MI) const;
bool selectGlobalLoadLds(MachineInstr &MI) const;		bool selectGlobalLoadLds(MachineInstr &MI) const;
bool selectBVHIntrinsic(MachineInstr &I) const;		bool selectBVHIntrinsic(MachineInstr &I) const;
bool selectSMFMACIntrin(MachineInstr &I) const;		bool selectSMFMACIntrin(MachineInstr &I) const;
bool selectWaveAddress(MachineInstr &I) const;		bool selectWaveAddress(MachineInstr &I) const;

std::pair<Register, unsigned>		std::pair<Register, unsigned>
selectVOP3ModsImpl(MachineOperand &Root, bool AllowAbs = true,		selectVOP3ModsImpl(MachineOperand &Root, bool AllowAbs = true,
bool OpSel = false, bool ForceVGPR = false) const;		bool OpSel = false) const;

		Register copyToVGPRIfSrcFolded(Register Src, unsigned Mods,
		MachineOperand Root, MachineInstr *InsertPt,
		bool ForceVGPR = false) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectVCSRC(MachineOperand &Root) const;		selectVCSRC(MachineOperand &Root) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectVSRC0(MachineOperand &Root) const;		selectVSRC0(MachineOperand &Root) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	#endif
// converted from f16 (in which case fmad isn't legal).		// converted from f16 (in which case fmad isn't legal).
if (!MatchedSrc0 && !MatchedSrc1 && !MatchedSrc2)		if (!MatchedSrc0 && !MatchedSrc1 && !MatchedSrc2)
return false;		return false;

const unsigned OpC = IsFMA ? AMDGPU::V_FMA_MIX_F32 : AMDGPU::V_MAD_MIX_F32;		const unsigned OpC = IsFMA ? AMDGPU::V_FMA_MIX_F32 : AMDGPU::V_MAD_MIX_F32;
MachineInstr *MixInst =		MachineInstr *MixInst =
BuildMI(*I.getParent(), I, I.getDebugLoc(), TII.get(OpC), Dst)		BuildMI(*I.getParent(), I, I.getDebugLoc(), TII.get(OpC), Dst)
.addImm(Src0Mods)		.addImm(Src0Mods)
.addReg(Src0)		.addReg(copyToVGPRIfSrcFolded(Src0, Src0Mods, I.getOperand(1), &I))
.addImm(Src1Mods)		.addImm(Src1Mods)
.addReg(Src1)		.addReg(copyToVGPRIfSrcFolded(Src1, Src1Mods, I.getOperand(2), &I))
.addImm(Src2Mods)		.addImm(Src2Mods)
.addReg(Src2)		.addReg(copyToVGPRIfSrcFolded(Src2, Src2Mods, I.getOperand(3), &I))
.addImm(0)		.addImm(0)
.addImm(0)		.addImm(0)
.addImm(0);		.addImm(0);

if (!constrainSelectedInstRegOperands(*MixInst, TII, TRI, RBI))		if (!constrainSelectedInstRegOperands(*MixInst, TII, TRI, RBI))
return false;		return false;

I.eraseFromParent();		I.eraseFromParent();
▲ Show 20 Lines • Show All 2,829 Lines • ▼ Show 20 Lines
AMDGPUInstructionSelector::selectVCSRC(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVCSRC(MachineOperand &Root) const {
return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.add(Root); }		[=](MachineInstrBuilder &MIB) { MIB.add(Root); }
}};		}};

}		}

std::pair<Register, unsigned> AMDGPUInstructionSelector::selectVOP3ModsImpl(		std::pair<Register, unsigned> AMDGPUInstructionSelector::selectVOP3ModsImpl(
MachineOperand &Root, bool AllowAbs, bool OpSel, bool ForceVGPR) const {		MachineOperand &Root, bool AllowAbs, bool OpSel) const {
Register Src = Root.getReg();		Register Src = Root.getReg();
Register OrigSrc = Src;
unsigned Mods = 0;		unsigned Mods = 0;
MachineInstr MI = getDefIgnoringCopies(Src, MRI);		MachineInstr MI = getDefIgnoringCopies(Src, MRI);

if (MI->getOpcode() == AMDGPU::G_FNEG) {		if (MI->getOpcode() == AMDGPU::G_FNEG) {
Src = MI->getOperand(1).getReg();		Src = MI->getOperand(1).getReg();
Mods \|= SISrcMods::NEG;		Mods \|= SISrcMods::NEG;
MI = getDefIgnoringCopies(Src, *MRI);		MI = getDefIgnoringCopies(Src, *MRI);
}		}

if (AllowAbs && MI->getOpcode() == AMDGPU::G_FABS) {		if (AllowAbs && MI->getOpcode() == AMDGPU::G_FABS) {
Src = MI->getOperand(1).getReg();		Src = MI->getOperand(1).getReg();
Mods \|= SISrcMods::ABS;		Mods \|= SISrcMods::ABS;
}		}

if (OpSel)		if (OpSel)
Mods \|= SISrcMods::OP_SEL_0;		Mods \|= SISrcMods::OP_SEL_0;

		return std::make_pair(Src, Mods);
		}

		Register AMDGPUInstructionSelector::copyToVGPRIfSrcFolded(
		Register Src, unsigned Mods, MachineOperand Root, MachineInstr *InsertPt,
		bool ForceVGPR) const {
		arsenmUnsubmitted Not Done Reply Inline Actions The name isn't accurate since this isn't just blindly inserting a copy. How about copyToVGPRIfSrcFolded arsenm: The name isn't accurate since this isn't just blindly inserting a copy. How about…
if ((Mods != 0 \|\| ForceVGPR) &&		if ((Mods != 0 \|\| ForceVGPR) &&
		arsenmUnsubmitted Not Done Reply Inline Actions Unrelated, but I don't understand why this modifier check is here. It doesn't factor into the constant bus restriction arsenm: Unrelated, but I don't understand why this modifier check is here. It doesn't factor into the…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions I wanted to remove both checks, also build copy into a vgpr class but there were way too many changes in regression tests (mostly mir). In those tests instructions have sgpr inputs (there is no copy). Here is the summary of the effects of Mods != 0 and cloneVirtualRegister sgpr input without mods, stays sgpr no copy sgpr input with mods, creates copy but copy is still to sgpr! (cloneVirtualRegister is for Root, which was sgpr) not vgpr In default use case (to my understanding at least) all checks could be removed instructions that fold these mods have inputs with vgpr reg bank Mods != 0 is most probably fast exit since operand should have already been vgpr cloneVirtualRegister is just a convenience and avoids having to get reg class from register (needs llt, then size in bits and call to helper that will get the class) Petar.Avramovic: I wanted to remove both checks, also build copy into a vgpr class but there were way too many…
RBI.getRegBank(Src, *MRI, TRI)->getID() != AMDGPU::VGPRRegBankID) {		RBI.getRegBank(Src, *MRI, TRI)->getID() != AMDGPU::VGPRRegBankID) {
MachineInstr *UseMI = Root.getParent();

// If we looked through copies to find source modifiers on an SGPR operand,		// If we looked through copies to find source modifiers on an SGPR operand,
// we now have an SGPR register source. To avoid potentially violating the		// we now have an SGPR register source. To avoid potentially violating the
// constant bus restriction, we need to insert a copy to a VGPR.		// constant bus restriction, we need to insert a copy to a VGPR.
Register VGPRSrc = MRI->cloneVirtualRegister(OrigSrc);		Register VGPRSrc = MRI->cloneVirtualRegister(Root.getReg());
BuildMI(*UseMI->getParent(), UseMI, UseMI->getDebugLoc(),		BuildMI(*InsertPt->getParent(), InsertPt, InsertPt->getDebugLoc(),
TII.get(AMDGPU::COPY), VGPRSrc)		TII.get(AMDGPU::COPY), VGPRSrc)
.addReg(Src);		.addReg(Src);
		arsenmUnsubmitted Not Done Reply Inline Actions This copy should just have been restricted in the first place. I also would have expected this to not have produced instructions unless the match happened; can you first try constraining the operands of this copy, and then splitting the predicate part from the instruction construction? arsenm: This copy should just have been restricted in the first place. I also would have expected this…
Src = VGPRSrc;		Src = VGPRSrc;
}		}

return std::make_pair(Src, Mods);		return Src;
}		}

///		///
/// This will select either an SGPR or VGPR operand and will save us from		/// This will select either an SGPR or VGPR operand and will save us from
/// having to write an extra tablegen pattern.		/// having to write an extra tablegen pattern.
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVSRC0(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVSRC0(MachineOperand &Root) const {
return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.add(Root); }		[=](MachineInstrBuilder &MIB) { MIB.add(Root); }
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3Mods0(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3Mods0(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root);

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(copyToVGPRIfSrcFolded(Src, Mods, Root, MIB));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods
[=](MachineInstrBuilder &MIB) { MIB.addImm(0); }, // clamp		[=](MachineInstrBuilder &MIB) { MIB.addImm(0); }, // clamp
[=](MachineInstrBuilder &MIB) { MIB.addImm(0); } // omod		[=](MachineInstrBuilder &MIB) { MIB.addImm(0); } // omod
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3BMods0(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3BMods0(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root, /* AllowAbs */ false);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root, /* AllowAbs */ false);

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(copyToVGPRIfSrcFolded(Src, Mods, Root, MIB));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods
[=](MachineInstrBuilder &MIB) { MIB.addImm(0); }, // clamp		[=](MachineInstrBuilder &MIB) { MIB.addImm(0); }, // clamp
[=](MachineInstrBuilder &MIB) { MIB.addImm(0); } // omod		[=](MachineInstrBuilder &MIB) { MIB.addImm(0); } // omod
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3OMods(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3OMods(MachineOperand &Root) const {
return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.add(Root); },		[=](MachineInstrBuilder &MIB) { MIB.add(Root); },
[=](MachineInstrBuilder &MIB) { MIB.addImm(0); }, // clamp		[=](MachineInstrBuilder &MIB) { MIB.addImm(0); }, // clamp
[=](MachineInstrBuilder &MIB) { MIB.addImm(0); } // omod		[=](MachineInstrBuilder &MIB) { MIB.addImm(0); } // omod
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3Mods(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3Mods(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root);

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(copyToVGPRIfSrcFolded(Src, Mods, Root, MIB));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3BMods(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3BMods(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root, /* AllowAbs */ false);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root, /* AllowAbs */ false);

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(copyToVGPRIfSrcFolded(Src, Mods, Root, MIB));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3NoMods(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3NoMods(MachineOperand &Root) const {
Register Reg = Root.getReg();		Register Reg = Root.getReg();
const MachineInstr Def = getDefIgnoringCopies(Reg, MRI);		const MachineInstr Def = getDefIgnoringCopies(Reg, MRI);
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
AMDGPUInstructionSelector::selectVOP3Mods_nnan(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3Mods_nnan(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root);
if (!isKnownNeverNaN(Src, *MRI))		if (!isKnownNeverNaN(Src, *MRI))
return None;		return None;

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(copyToVGPRIfSrcFolded(Src, Mods, Root, MIB));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVOP3OpSelMods(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVOP3OpSelMods(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root);

// FIXME: Handle op_sel		// FIXME: Handle op_sel
return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVINTERPMods(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVINTERPMods(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root,		std::tie(Src, Mods) = selectVOP3ModsImpl(Root,
/* AllowAbs */ false,		/* AllowAbs */ false,
/* OpSel */ false,		/* OpSel */ false);
/* ForceVGPR */ true);

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(
		copyToVGPRIfSrcFolded(Src, Mods, Root, MIB, /* ForceVGPR */ true));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectVINTERPModsHi(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectVINTERPModsHi(MachineOperand &Root) const {
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root,		std::tie(Src, Mods) = selectVOP3ModsImpl(Root,
/* AllowAbs */ false,		/* AllowAbs */ false,
/* OpSel */ true,		/* OpSel */ true);
/* ForceVGPR */ true);

		arsenmUnsubmitted Not Done Reply Inline Actions Could you put this down in the callback with the addReg (same with the rest) arsenm: Could you put this down in the callback with the addReg (same with the rest)
return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) {
		MIB.addReg(
		copyToVGPRIfSrcFolded(Src, Mods, Root, MIB, /* ForceVGPR */ true));
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods
}};		}};
}		}

bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,		bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,
Register &Base,		Register &Base,
Register *SOffset,		Register *SOffset,
int64_t *Offset) const {		int64_t *Offset) const {
▲ Show 20 Lines • Show All 1,150 Lines • ▼ Show 20 Lines	AMDGPUInstructionSelector::selectVOP3PMadMixModsImpl(MachineOperand &Root,
Register Src;		Register Src;
unsigned Mods;		unsigned Mods;
std::tie(Src, Mods) = selectVOP3ModsImpl(Root);		std::tie(Src, Mods) = selectVOP3ModsImpl(Root);

MachineInstr MI = getDefIgnoringCopies(Src, MRI);		MachineInstr MI = getDefIgnoringCopies(Src, MRI);
if (MI->getOpcode() == AMDGPU::G_FPEXT) {		if (MI->getOpcode() == AMDGPU::G_FPEXT) {
MachineOperand *MO = &MI->getOperand(1);		MachineOperand *MO = &MI->getOperand(1);
Src = MO->getReg();		Src = MO->getReg();
MI = getDefIgnoringCopies(Src, *MRI);		MI = getDefIgnoringCopies(Src, *MRI);
		Pierre-vhUnsubmitted Not Done Reply Inline Actions With this it looks like we will end up calling `getVOP3ModsImpl` twice. I guess this isn't called often enough (and the "get" function seems simple enough) to cause performance issues but it still feels suboptimal Could this be refactored to make it so selectVOP3ModsImpl doesn't need to call "get" again? If not then I suppose it's fine as is Pierre-vh: With this it looks like we will end up calling `getVOP3ModsImpl` twice. I guess this isn't…
		arsenmUnsubmitted Not Done Reply Inline Actions I agree, you should know everything after the first round of selectVOP3PMadMixModsImpl and should just pass through what to do arsenm: I agree, you should know everything after the first round of selectVOP3PMadMixModsImpl and…

assert(MRI->getType(Src) == LLT::scalar(16));		assert(MRI->getType(Src) == LLT::scalar(16));

// See through bitcasts.		// See through bitcasts.
// FIXME: Would be nice to use stripBitCast here.		// FIXME: Would be nice to use stripBitCast here.
if (MI->getOpcode() == AMDGPU::G_BITCAST) {		if (MI->getOpcode() == AMDGPU::G_BITCAST) {
MO = &MI->getOperand(1);		MO = &MI->getOperand(1);
Src = MO->getReg();		Src = MO->getReg();
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/fma.ll

	Show First 20 Lines • Show All 1,031 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_fma_f32 v0, v0, v1, -v2			; GFX11-NEXT: v_fma_f32 v0, v0, v1, -v2
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%neg.z = fneg float %z			%neg.z = fneg float %z
	%fma = call float @llvm.fma.f32(float %x, float %y, float %neg.z)			%fma = call float @llvm.fma.f32(float %x, float %y, float %neg.z)
	ret float %fma			ret float %fma
	}			}

				define amdgpu_ps float @dont_crash_after_fma_mix_select_attempt(float inreg %x, float %y, float %z) {
				; GFX6-LABEL: dont_crash_after_fma_mix_select_attempt:
				; GFX6: ; %bb.0: ; %.entry
				; GFX6-NEXT: v_fma_f32 v0, \|s0\|, v0, v1
				; GFX6-NEXT: ; return to shader part epilog
				;
				; GFX8-LABEL: dont_crash_after_fma_mix_select_attempt:
				; GFX8: ; %bb.0: ; %.entry
				; GFX8-NEXT: v_fma_f32 v0, \|s0\|, v0, v1
				; GFX8-NEXT: ; return to shader part epilog
				;
				; GFX9-LABEL: dont_crash_after_fma_mix_select_attempt:
				; GFX9: ; %bb.0: ; %.entry
				; GFX9-NEXT: v_fma_f32 v0, \|s0\|, v0, v1
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: dont_crash_after_fma_mix_select_attempt:
				; GFX10: ; %bb.0: ; %.entry
				; GFX10-NEXT: v_fma_f32 v0, \|s0\|, v0, v1
				; GFX10-NEXT: ; return to shader part epilog
				;
				; GFX11-LABEL: dont_crash_after_fma_mix_select_attempt:
				; GFX11: ; %bb.0: ; %.entry
				; GFX11-NEXT: v_fma_f32 v0, \|s0\|, v0, v1
				; GFX11-NEXT: ; return to shader part epilog
				.entry:
				%fabs.x = call contract float @llvm.fabs.f32(float %x)
				%fma = call float @llvm.fma.f32(float %fabs.x, float %y, float %z)
				ret float %fma
				}

	declare half @llvm.fma.f16(half, half, half) #0			declare half @llvm.fma.f16(half, half, half) #0
	declare float @llvm.fma.f32(float, float, float) #0			declare float @llvm.fma.f32(float, float, float) #0
	declare double @llvm.fma.f64(double, double, double) #0			declare double @llvm.fma.f64(double, double, double) #0

	declare half @llvm.fabs.f16(half) #0			declare half @llvm.fabs.f16(half) #0
	declare float @llvm.fabs.f32(float) #0			declare float @llvm.fabs.f32(float) #0

	declare <2 x half> @llvm.fma.v2f16(<2 x half>, <2 x half>, <2 x half>) #0			declare <2 x half> @llvm.fma.v2f16(<2 x half>, <2 x half>, <2 x half>) #0
	declare <2 x float> @llvm.fma.v2f32(<2 x float>, <2 x float>, <2 x float>) #0			declare <2 x float> @llvm.fma.v2f32(<2 x float>, <2 x float>, <2 x float>) #0
	declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>) #0			declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>) #0

	declare <3 x half> @llvm.fma.v3f16(<3 x half>, <3 x half>, <3 x half>) #0			declare <3 x half> @llvm.fma.v3f16(<3 x half>, <3 x half>, <3 x half>) #0
	declare <4 x half> @llvm.fma.v4f16(<4 x half>, <4 x half>, <4 x half>) #0			declare <4 x half> @llvm.fma.v4f16(<4 x half>, <4 x half>, <4 x half>) #0

	attributes #0 = { nounwind readnone speculatable willreturn }			attributes #0 = { nounwind readnone speculatable willreturn }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476504

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/fma.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476504

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/fma.ll

AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection
ClosedPublic