This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence
ClosedPublic

Authored by uweigand on Dec 3 2019, 8:11 AM.

Download Raw Diff

Details

Reviewers

spatel
cameron.mcinally
arsenm

Commits

rGc3d05c1b5209: [SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence

Summary

As discussed in D70852, currently LLVM may introduce calls to fmin/fmax into programs that originally did not have any dependency against libm, potentially causing failures at link time.

This patch attempts to solve this issue by adding code to TargetLowering::expandFMINNUM_FMAXNUM to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the libcall. This is done only if the node is marked as "nnan". In this case, the expansion to compare+select is always correct. This also catches all cases where the FMINNUM/FMAXNUM was synthesized by LLVM; this is also only done in the nnan case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

uweigand created this revision.Dec 3 2019, 8:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2019, 8:11 AM

Herald added subscribers: llvm-commits, hiraditya, wdng. · View Herald Transcript

uweigand mentioned this in D70852: [InstCombine] Guard maxnum/minnum conversions with a TTI query.Dec 3 2019, 8:13 AM

spatel added inline comments.Dec 3 2019, 9:08 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6234	I think this is an acceptable check of the FMF, but the reasoning is subtle. If we synthesized the intrinsic in IR, it required 'nsz' for the fcmp/select because the original code using fcmp can produce a different result for -0.0. And that is because the intrinsics have this clause: "If the operands compare equal, returns a value that compares equal to both operands. This means that fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0" So once we have the intrinsic, the 'nsz' behavior becomes implicit. And since we're not checking for 'nsz' explicitly in this expansion, that means that we may be transforming code that originally had the libcall in source to inline code (the intrinsic was not created by InstCombine without 'nsz'). But that's probably a good optimization for a target that is expanding these nodes anyway? To not lose an optimization opportunity, I think we need to add 'nsz' to the setFlags() call under here. Or avoid this complexity and check for 'nsz' in the first place.
6237–6239	Variables named 'Tmp' make me nervous. :) I'd prefer to not reassign local names: SDValue Op0 = Node->getOperand(0); SDValue Op1 = Node->getOperand(1); SDValue SelCC = DAG.getSelectCC(dl, Op0, Op1, Op0, Op1, Pred);

uweigand updated this revision to Diff 231960.Dec 3 2019, 12:24 PM

uweigand marked 4 inline comments as done.Dec 3 2019, 12:27 PM

uweigand added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6234	I deliberately did not check nsz because I understood this to be implied anyway by the semantics of the FMINNUM/FMAXNUM nodes: /// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0. I've now added code to explicitly set nsz on the select node.
6237–6239	Sorry about that, too much copy-and-paste from LegalizeDAG :-)

LGTM

This revision is now accepted and ready to land.Dec 3 2019, 1:22 PM

Closed by commit rGc3d05c1b5209: [SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence (authored by uweigand). · Explain WhyDec 4 2019, 1:39 AM

This revision was automatically updated to reflect the committed changes.

uweigand marked 2 inline comments as done.

spatel mentioned this in D122610: [SDAG] avoid libcalls to fmin/fmax for soft-float targets.Mar 28 2022, 12:37 PM

spatel mentioned this in rG436b875e49ec: [SDAG] avoid libcalls to fmin/fmax for soft-float targets.Mar 30 2022, 8:22 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

20 lines

test/

CodeGen/

SystemZ/

fp-libcall.ll

62 lines

Diff 232053

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,219 Lines • ▼ Show 20 Lines	if (Node->getFlags().hasNoNaNs()) {
unsigned IEEE2018Op =		unsigned IEEE2018Op =
Node->getOpcode() == ISD::FMINNUM ? ISD::FMINIMUM : ISD::FMAXIMUM;		Node->getOpcode() == ISD::FMINNUM ? ISD::FMINIMUM : ISD::FMAXIMUM;
if (isOperationLegalOrCustom(IEEE2018Op, VT)) {		if (isOperationLegalOrCustom(IEEE2018Op, VT)) {
return DAG.getNode(IEEE2018Op, dl, VT, Node->getOperand(0),		return DAG.getNode(IEEE2018Op, dl, VT, Node->getOperand(0),
Node->getOperand(1), Node->getFlags());		Node->getOperand(1), Node->getFlags());
}		}
}		}

		// If none of the above worked, but there are no NaNs, then expand to
		// a compare/select sequence. This is required for correctness since
		// InstCombine might have canonicalized a fcmp+select sequence to a
		// FMINNUM/FMAXNUM node. If we were to fall through to the default
		// expansion to libcall, we might introduce a link-time dependency
		// on libm into a file that originally did not have one.
		if (Node->getFlags().hasNoNaNs()) {
		spatelUnsubmitted Done Reply Inline Actions I think this is an acceptable check of the FMF, but the reasoning is subtle. If we synthesized the intrinsic in IR, it required 'nsz' for the fcmp/select because the original code using fcmp can produce a different result for -0.0. And that is because the intrinsics have this clause: "If the operands compare equal, returns a value that compares equal to both operands. This means that fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0" So once we have the intrinsic, the 'nsz' behavior becomes implicit. And since we're not checking for 'nsz' explicitly in this expansion, that means that we may be transforming code that originally had the libcall in source to inline code (the intrinsic was not created by InstCombine without 'nsz'). But that's probably a good optimization for a target that is expanding these nodes anyway? To not lose an optimization opportunity, I think we need to add 'nsz' to the setFlags() call under here. Or avoid this complexity and check for 'nsz' in the first place. spatel: I think this is an acceptable check of the FMF, but the reasoning is subtle. If we synthesized…
		uweigandAuthorUnsubmitted Done Reply Inline Actions I deliberately did not check nsz because I understood this to be implied anyway by the semantics of the FMINNUM/FMAXNUM nodes: /// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0. I've now added code to explicitly set nsz on the select node. uweigand: I deliberately did not check nsz because I understood this to be implied anyway by the…
		ISD::CondCode Pred =
		Node->getOpcode() == ISD::FMINNUM ? ISD::SETLT : ISD::SETGT;
		SDValue Op1 = Node->getOperand(0);
		SDValue Op2 = Node->getOperand(1);
		SDValue SelCC = DAG.getSelectCC(dl, Op1, Op2, Op1, Op2, Pred);
		spatelUnsubmitted Done Reply Inline Actions Variables named 'Tmp' make me nervous. :) I'd prefer to not reassign local names: SDValue Op0 = Node->getOperand(0); SDValue Op1 = Node->getOperand(1); SDValue SelCC = DAG.getSelectCC(dl, Op0, Op1, Op0, Op1, Pred); spatel: Variables named 'Tmp' make me nervous. :) I'd prefer to not reassign local names: SDValue Op0…
		uweigandAuthorUnsubmitted Done Reply Inline Actions Sorry about that, too much copy-and-paste from LegalizeDAG :-) uweigand: Sorry about that, too much copy-and-paste from LegalizeDAG :-)
		// Copy FMF flags, but always set the no-signed-zeros flag
		// as this is implied by the FMINNUM/FMAXNUM semantics.
		SDNodeFlags Flags = Node->getFlags();
		Flags.setNoSignedZeros(true);
		SelCC->setFlags(Flags);
		return SelCC;
		}

return SDValue();		return SDValue();
}		}

bool TargetLowering::expandCTPOP(SDNode *Node, SDValue &Result,		bool TargetLowering::expandCTPOP(SDNode *Node, SDValue &Result,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Node);		SDLoc dl(Node);
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());		EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
▲ Show 20 Lines • Show All 1,208 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/fp-libcall.ll

	Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines

	define fp128 @f33(fp128 %x, fp128 %y) {			define fp128 @f33(fp128 %x, fp128 %y) {
	; CHECK-LABEL: f33:			; CHECK-LABEL: f33:
	; CHECK: brasl %r14, fmaxl@PLT			; CHECK: brasl %r14, fmaxl@PLT
	%tmp = call fp128 @llvm.maxnum.f128(fp128 %x, fp128 %y)			%tmp = call fp128 @llvm.maxnum.f128(fp128 %x, fp128 %y)
	ret fp128 %tmp			ret fp128 %tmp
	}			}

				; Verify that "nnan" minnum/maxnum calls are transformed to
				; compare+select sequences instead of libcalls.
				define float @f34(float %x, float %y) {
				; CHECK-LABEL: f34:
				; CHECK: cebr %f0, %f2
				; CHECK: blr %r14
				; CHECK: ler %f0, %f2
				; CHECK: br %r14
				%tmp = call nnan float @llvm.minnum.f32(float %x, float %y)
				ret float %tmp
				}

				define double @f35(double %x, double %y) {
				; CHECK-LABEL: f35:
				; CHECK: cdbr %f0, %f2
				; CHECK: blr %r14
				; CHECK: ldr %f0, %f2
				; CHECK: br %r14
				%tmp = call nnan double @llvm.minnum.f64(double %x, double %y)
				ret double %tmp
				}

				define fp128 @f36(fp128 %x, fp128 %y) {
				; CHECK-LABEL: f36:
				; CHECK: cxbr
				; CHECK: jl
				; CHECK: lxr
				; CHECK: br %r14
				%tmp = call nnan fp128 @llvm.minnum.f128(fp128 %x, fp128 %y)
				ret fp128 %tmp
				}

				define float @f37(float %x, float %y) {
				; CHECK-LABEL: f37:
				; CHECK: cebr %f0, %f2
				; CHECK: bhr %r14
				; CHECK: ler %f0, %f2
				; CHECK: br %r14
				%tmp = call nnan float @llvm.maxnum.f32(float %x, float %y)
				ret float %tmp
				}

				define double @f38(double %x, double %y) {
				; CHECK-LABEL: f38:
				; CHECK: cdbr %f0, %f2
				; CHECK: bhr %r14
				; CHECK: ldr %f0, %f2
				; CHECK: br %r14
				%tmp = call nnan double @llvm.maxnum.f64(double %x, double %y)
				ret double %tmp
				}

				define fp128 @f39(fp128 %x, fp128 %y) {
				; CHECK-LABEL: f39:
				; CHECK: cxbr
				; CHECK: jh
				; CHECK: lxr
				; CHECK: br %r14
				%tmp = call nnan fp128 @llvm.maxnum.f128(fp128 %x, fp128 %y)
				ret fp128 %tmp
				}

	declare float @llvm.powi.f32(float, i32)			declare float @llvm.powi.f32(float, i32)
	declare double @llvm.powi.f64(double, i32)			declare double @llvm.powi.f64(double, i32)
	declare fp128 @llvm.powi.f128(fp128, i32)			declare fp128 @llvm.powi.f128(fp128, i32)
	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)
	declare fp128 @llvm.pow.f128(fp128, fp128)			declare fp128 @llvm.pow.f128(fp128, fp128)

	declare float @llvm.sin.f32(float)			declare float @llvm.sin.f32(float)
	Show All 30 Lines