This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUISelLowering.h
2/5
AMDGPUISelLowering.cpp
-
AMDGPUPostLegalizerCombiner.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
fneg-combines.f16.ll
2/5
fneg-combines.new.ll

Differential D140135

AMDGPU: Try to unfold fneg source when matching legacy fmin/fmax
ClosedPublic

Authored by arsenm on Dec 15 2022, 11:23 AM.

Download Raw Diff

Details

Reviewers

rampitec
bogner
Pierre-vh
foad

Group Reviewers

Restricted Project

Summary

This is NFC as it stands, since other combines will effectively
prevent this from being reachable. This will avoid regressions in a
future change which tries to make better use of select source
modifiers.

Didn't bother with the GlobalISel part for now, since the baseline
combine doesn't seem to work on the existing test.

This is the target specific partner to D140128

Diff Detail

Event Timeline

arsenm created this revision.Dec 15 2022, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 11:23 AM

Herald added subscribers: kosarev, kerbowa, hiraditya and 5 others. · View Herald Transcript

arsenm requested review of this revision.Dec 15 2022, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 11:23 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B203407: Diff 483252.Dec 15 2022, 2:25 PM

Rebase on some expanded test coverage

Harbormaster completed remote builds in B204216: Diff 484347.Dec 20 2022, 12:03 PM

foad added inline comments.Dec 20 2022, 11:30 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1416	This function doesn't use False. Might be clearer to take a Swapped flag instead of True and False.
1502	It's very unclear what you're trying to match here. I think it's something like this, depending on whether NegTrue actually stripped a neg or not: // LHS op RHS ? LHS : -RHS -> -min/max(LHS, RHS) // LHS op RHS ? -LHS : -RHS -> -min/max(LHS, RHS) The first one doesn't make any sense to me. For the second one don't you need to flip the condition code?
llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
295–297	I think these three lines implement roughly max(s0, 0) so how can converting it to min(0, s0) be correct?

arsenm added inline comments.Dec 21 2022, 4:17 AM

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
295–297	It's ngt, not gt. It's more like !(v_cmp_le_f32)

foad added inline comments.Dec 21 2022, 4:31 AM

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
295–297	Right, the code is doing `s0<=0?v0:v1` which is `s0<=0?-0:s0` which is max.

foad added inline comments.Dec 21 2022, 4:39 AM

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
295–297	Oh sorry I think I've misremembered how cndmask works.

arsenm added inline comments.Jan 20 2023, 6:06 AM

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
295–297	Right, cndmask is backward from how you would expect

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 20 2023, 6:06 AM

ping

foad added inline comments.Jan 23 2023, 3:47 AM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1502	I still don't understand this

arsenm added inline comments.Jan 23 2023, 4:17 AM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1502	fmin/fmax legacy are not commutative so there's no swapping here. Your sample here missed the part where the constant was negated, this is just pulling a negate out of the two select operands

Add comment

Harbormaster completed remote builds in B209333: Diff 491310.Jan 23 2023, 6:05 AM

arsenm added a child revision: D142418: AMDGPU: Teach fneg combines that select has source modifiers.Jan 23 2023, 4:28 PM

lgtm

This revision is now accepted and ready to land.Jan 31 2023, 10:54 AM

36cfe26a5288d99e66c75d82989d154874999b98

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1416	I think it's harder to read if you're considering the caller context

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelLowering.h

4 lines

AMDGPUISelLowering.cpp

56 lines

AMDGPUPostLegalizerCombiner.cpp

2 lines

test/

CodeGen/

AMDGPU/

fneg-combines.f16.ll

11 lines

fneg-combines.new.ll

8 lines

Diff 491310

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	SDValue LowerDYNAMIC_STACKALLOC(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;		SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;
SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;		SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;
void ReplaceNodeResults(SDNode * N,		void ReplaceNodeResults(SDNode * N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

		SDValue combineFMinMaxLegacyImpl(const SDLoc &DL, EVT VT, SDValue LHS,
		SDValue RHS, SDValue True, SDValue False,
		SDValue CC, DAGCombinerInfo &DCI) const;

SDValue combineFMinMaxLegacy(const SDLoc &DL, EVT VT, SDValue LHS,		SDValue combineFMinMaxLegacy(const SDLoc &DL, EVT VT, SDValue LHS,
SDValue RHS, SDValue True, SDValue False,		SDValue RHS, SDValue True, SDValue False,
SDValue CC, DAGCombinerInfo &DCI) const;		SDValue CC, DAGCombinerInfo &DCI) const;

const char* getTargetNodeName(unsigned Opcode) const override;		const char* getTargetNodeName(unsigned Opcode) const override;

// FIXME: Turn off MergeConsecutiveStores() before Instruction Selection for		// FIXME: Turn off MergeConsecutiveStores() before Instruction Selection for
// AMDGPU. Commit r319036,		// AMDGPU. Commit r319036,
▲ Show 20 Lines • Show All 312 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 1,400 Lines • ▼ Show 20 Lines	if (((SrcVT == MVT::v16f16 && VT == MVT::v8f16) \|\|
return Op;		return Op;

DAG.ExtractVectorElements(Op.getOperand(0), Args, Start,		DAG.ExtractVectorElements(Op.getOperand(0), Args, Start,
VT.getVectorNumElements());		VT.getVectorNumElements());

return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Args);		return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Args);
}		}

/// Generate Min/Max node		// TODO: Handle fabs too
SDValue AMDGPUTargetLowering::combineFMinMaxLegacy(const SDLoc &DL, EVT VT,		static SDValue peekFNeg(SDValue Val) {
SDValue LHS, SDValue RHS,		if (Val.getOpcode() == ISD::FNEG)
SDValue True, SDValue False,		return Val.getOperand(0);
SDValue CC,
DAGCombinerInfo &DCI) const {		return Val;
if (!(LHS == True && RHS == False) && !(LHS == False && RHS == True))		}
return SDValue();		SDValue AMDGPUTargetLowering::combineFMinMaxLegacyImpl(
		foadUnsubmitted Not Done Reply Inline Actions This function doesn't use False. Might be clearer to take a Swapped flag instead of True and False. foad: This function doesn't use False. Might be clearer to take a Swapped flag instead of True and…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I think it's harder to read if you're considering the caller context arsenm: I think it's harder to read if you're considering the caller context
		const SDLoc &DL, EVT VT, SDValue LHS, SDValue RHS, SDValue True,
		SDValue False, SDValue CC, DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
ISD::CondCode CCOpcode = cast<CondCodeSDNode>(CC)->get();		ISD::CondCode CCOpcode = cast<CondCodeSDNode>(CC)->get();
switch (CCOpcode) {		switch (CCOpcode) {
case ISD::SETOEQ:		case ISD::SETOEQ:
case ISD::SETONE:		case ISD::SETONE:
case ISD::SETUNE:		case ISD::SETUNE:
case ISD::SETNE:		case ISD::SETNE:
case ISD::SETUEQ:		case ISD::SETUEQ:
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	case ISD::SETOGT: {
return DAG.getNode(AMDGPUISD::FMIN_LEGACY, DL, VT, RHS, LHS);		return DAG.getNode(AMDGPUISD::FMIN_LEGACY, DL, VT, RHS, LHS);
}		}
case ISD::SETCC_INVALID:		case ISD::SETCC_INVALID:
llvm_unreachable("Invalid setcc condcode!");		llvm_unreachable("Invalid setcc condcode!");
}		}
return SDValue();		return SDValue();
}		}

		/// Generate Min/Max node
		SDValue AMDGPUTargetLowering::combineFMinMaxLegacy(const SDLoc &DL, EVT VT,
		SDValue LHS, SDValue RHS,
		SDValue True, SDValue False,
		SDValue CC,
		DAGCombinerInfo &DCI) const {
		if ((LHS == True && RHS == False) \|\| (LHS == False && RHS == True))
		return combineFMinMaxLegacyImpl(DL, VT, LHS, RHS, True, False, CC, DCI);

		SelectionDAG &DAG = DCI.DAG;

		// If we can't directly match this, try to see if we can fold an fneg to
		// match.

		ConstantFPSDNode *CRHS = dyn_cast<ConstantFPSDNode>(RHS);
		ConstantFPSDNode *CFalse = dyn_cast<ConstantFPSDNode>(False);
		SDValue NegTrue = peekFNeg(True);

		// Undo the combine foldFreeOpFromSelect does if it helps us match the
		foadUnsubmitted Not Done Reply Inline Actions It's very unclear what you're trying to match here. I think it's something like this, depending on whether NegTrue actually stripped a neg or not: // LHS op RHS ? LHS : -RHS -> -min/max(LHS, RHS) // LHS op RHS ? -LHS : -RHS -> -min/max(LHS, RHS) The first one doesn't make any sense to me. For the second one don't you need to flip the condition code? foad: It's very unclear what you're trying to match here. I think it's something like this, depending…
		foadUnsubmitted Not Done Reply Inline Actions I still don't understand this foad: I still don't understand this
		arsenmAuthorUnsubmitted Done Reply Inline Actions fmin/fmax legacy are not commutative so there's no swapping here. Your sample here missed the part where the constant was negated, this is just pulling a negate out of the two select operands arsenm: fmin/fmax legacy are not commutative so there's no swapping here. Your sample here missed the…
		// fmin/fmax.
		//
		// select (fcmp olt (lhs, K)), (fneg lhs), -K
		// -> fneg (fmin_legacy lhs, K)
		//
		// TODO: Use getNegatedExpression
		if (LHS == NegTrue && CFalse && CRHS) {
		APFloat NegRHS = neg(CRHS->getValueAPF());
		if (NegRHS == CFalse->getValueAPF()) {
		SDValue Combined =
		combineFMinMaxLegacyImpl(DL, VT, LHS, RHS, NegTrue, False, CC, DCI);
		if (Combined)
		return DAG.getNode(ISD::FNEG, DL, VT, Combined);
		return SDValue();
		}
		}

		return SDValue();
		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
AMDGPUTargetLowering::split64BitValue(SDValue Op, SelectionDAG &DAG) const {		AMDGPUTargetLowering::split64BitValue(SDValue Op, SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);

SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Op);		SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Op);

const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);		const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);
const SDValue One = DAG.getConstant(1, SL, MVT::i32);		const SDValue One = DAG.getConstant(1, SL, MVT::i32);
▲ Show 20 Lines • Show All 3,474 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	bool AMDGPUPostLegalizerCombinerHelper::matchFMinFMaxLegacy(
if (!MRI.hasOneNonDBGUse(Cond) \|\|		if (!MRI.hasOneNonDBGUse(Cond) \|\|
!mi_match(Cond, MRI,		!mi_match(Cond, MRI,
m_GFCmp(m_Pred(Info.Pred), m_Reg(Info.LHS), m_Reg(Info.RHS))))		m_GFCmp(m_Pred(Info.Pred), m_Reg(Info.LHS), m_Reg(Info.RHS))))
return false;		return false;

Info.True = MI.getOperand(2).getReg();		Info.True = MI.getOperand(2).getReg();
Info.False = MI.getOperand(3).getReg();		Info.False = MI.getOperand(3).getReg();

		// TODO: Handle case where the the selected value is an fneg and the compared
		// constant is the negation of the selected value.
if (!(Info.LHS == Info.True && Info.RHS == Info.False) &&		if (!(Info.LHS == Info.True && Info.RHS == Info.False) &&
!(Info.LHS == Info.False && Info.RHS == Info.True))		!(Info.LHS == Info.False && Info.RHS == Info.True))
return false;		return false;

switch (Info.Pred) {		switch (Info.Pred) {
case CmpInst::FCMP_FALSE:		case CmpInst::FCMP_FALSE:
case CmpInst::FCMP_OEQ:		case CmpInst::FCMP_OEQ:
case CmpInst::FCMP_ONE:		case CmpInst::FCMP_ONE:
▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll

	Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines

	; This is a workaround because -enable-no-signed-zeros-fp-math does not set up			; This is a workaround because -enable-no-signed-zeros-fp-math does not set up
	; function attribute unsafe-fp-math automatically. Combine with the previous test			; function attribute unsafe-fp-math automatically. Combine with the previous test
	; when that is done.			; when that is done.
	define amdgpu_ps half @fneg_fadd_0_nsz_f16(half inreg %tmp2, half inreg %tmp6, <4 x i32> %arg) #2 {			define amdgpu_ps half @fneg_fadd_0_nsz_f16(half inreg %tmp2, half inreg %tmp6, <4 x i32> %arg) #2 {
	; SI-SAFE-LABEL: fneg_fadd_0_nsz_f16:			; SI-SAFE-LABEL: fneg_fadd_0_nsz_f16:
	; SI-SAFE: ; %bb.0: ; %.entry			; SI-SAFE: ; %bb.0: ; %.entry
	; SI-SAFE-NEXT: v_cvt_f16_f32_e32 v0, s0			; SI-SAFE-NEXT: v_cvt_f16_f32_e32 v0, s0
	; SI-SAFE-NEXT: v_bfrev_b32_e32 v1, 1			; SI-SAFE-NEXT: s_brev_b32 s0, 1
	; SI-SAFE-NEXT: v_mov_b32_e32 v2, 0x7fc00000			; SI-SAFE-NEXT: v_mov_b32_e32 v1, 0x7fc00000
	; SI-SAFE-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-SAFE-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-SAFE-NEXT: v_cmp_nlt_f32_e32 vcc, 0, v0			; SI-SAFE-NEXT: v_min_legacy_f32_e32 v0, 0, v0
	; SI-SAFE-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc			; SI-SAFE-NEXT: v_cmp_ngt_f32_e32 vcc, s0, v0
	; SI-SAFE-NEXT: v_cmp_nlt_f32_e32 vcc, 0, v0			; SI-SAFE-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc
	; SI-SAFE-NEXT: v_cndmask_b32_e64 v0, v2, 0, vcc
	; SI-SAFE-NEXT: ; return to shader part epilog			; SI-SAFE-NEXT: ; return to shader part epilog
	;			;
	; SI-NSZ-LABEL: fneg_fadd_0_nsz_f16:			; SI-NSZ-LABEL: fneg_fadd_0_nsz_f16:
	; SI-NSZ: ; %bb.0: ; %.entry			; SI-NSZ: ; %bb.0: ; %.entry
	; SI-NSZ-NEXT: v_cvt_f16_f32_e32 v0, s1			; SI-NSZ-NEXT: v_cvt_f16_f32_e32 v0, s1
	; SI-NSZ-NEXT: v_cvt_f16_f32_e32 v1, s0			; SI-NSZ-NEXT: v_cvt_f16_f32_e32 v1, s0
	; SI-NSZ-NEXT: v_mov_b32_e32 v2, 0x7fc00000			; SI-NSZ-NEXT: v_mov_b32_e32 v2, 0x7fc00000
	; SI-NSZ-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NSZ-NEXT: v_cvt_f32_f16_e32 v0, v0
	▲ Show 20 Lines • Show All 4,269 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll

	Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	}			}

	; This is a workaround because -enable-no-signed-zeros-fp-math does not set up			; This is a workaround because -enable-no-signed-zeros-fp-math does not set up
	; function attribute unsafe-fp-math automatically. Combine with the previous test			; function attribute unsafe-fp-math automatically. Combine with the previous test
	; when that is done.			; when that is done.
	define amdgpu_ps float @fneg_fadd_0_nsz_f32(float inreg %tmp2, float inreg %tmp6, <4 x i32> %arg) #2 {			define amdgpu_ps float @fneg_fadd_0_nsz_f32(float inreg %tmp2, float inreg %tmp6, <4 x i32> %arg) #2 {
	; SI-SAFE-LABEL: fneg_fadd_0_nsz_f32:			; SI-SAFE-LABEL: fneg_fadd_0_nsz_f32:
	; SI-SAFE: ; %bb.0: ; %.entry			; SI-SAFE: ; %bb.0: ; %.entry
	; SI-SAFE-NEXT: v_bfrev_b32_e32 v0, 1			; SI-SAFE-NEXT: v_min_legacy_f32_e64 v0, 0, s0
	; SI-SAFE-NEXT: v_mov_b32_e32 v1, s0			; SI-SAFE-NEXT: s_brev_b32 s0, 1
	; SI-SAFE-NEXT: v_cmp_ngt_f32_e64 vcc, s0, 0
	; SI-SAFE-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc
	foadUnsubmitted Not Done Reply Inline Actions I think these three lines implement roughly max(s0, 0) so how can converting it to min(0, s0) be correct? foad: I think these three lines implement roughly max(s0, 0) so how can converting it to min(0, s0)…
	arsenmAuthorUnsubmitted Done Reply Inline Actions It's ngt, not gt. It's more like !(v_cmp_le_f32) arsenm: It's ngt, not gt. It's more like !(v_cmp_le_f32)
	foadUnsubmitted Not Done Reply Inline Actions Right, the code is doing `s0<=0?v0:v1` which is `s0<=0?-0:s0` which is max. foad: Right, the code is doing `s0<=0?v0:v1` which is `s0<=0?-0:s0` which is max.
	foadUnsubmitted Not Done Reply Inline Actions Oh sorry I think I've misremembered how cndmask works. foad: Oh sorry I think I've misremembered how cndmask works.
	arsenmAuthorUnsubmitted Done Reply Inline Actions Right, cndmask is backward from how you would expect arsenm: Right, cndmask is backward from how you would expect
	; SI-SAFE-NEXT: v_mov_b32_e32 v1, 0x7fc00000			; SI-SAFE-NEXT: v_mov_b32_e32 v1, 0x7fc00000
	; SI-SAFE-NEXT: v_cmp_nlt_f32_e32 vcc, 0, v0			; SI-SAFE-NEXT: v_cmp_ngt_f32_e32 vcc, s0, v0
	; SI-SAFE-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc			; SI-SAFE-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc
	; SI-SAFE-NEXT: ; return to shader part epilog			; SI-SAFE-NEXT: ; return to shader part epilog
	;			;
	; GCN-NSZ-LABEL: fneg_fadd_0_nsz_f32:			; GCN-NSZ-LABEL: fneg_fadd_0_nsz_f32:
	; GCN-NSZ: ; %bb.0: ; %.entry			; GCN-NSZ: ; %bb.0: ; %.entry
	; GCN-NSZ-NEXT: v_rcp_f32_e32 v0, s1			; GCN-NSZ-NEXT: v_rcp_f32_e32 v0, s1
	; GCN-NSZ-NEXT: v_mov_b32_e32 v1, s0			; GCN-NSZ-NEXT: v_mov_b32_e32 v1, s0
	; GCN-NSZ-NEXT: v_mul_f32_e32 v0, 0x80000000, v0			; GCN-NSZ-NEXT: v_mul_f32_e32 v0, 0x80000000, v0
	▲ Show 20 Lines • Show All 2,522 Lines • Show Last 20 Lines