This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize minnum/maxnum with 'nnan' to fcmp+select
AbandonedPublic

Authored by spatel on May 20 2019, 1:07 PM.

Download Raw Diff

Details

Reviewers

arsenm
efriedma
lebedev.ri

Summary

This is an atypical proposal because we are converting 1 instruction (intrinsic) into 2 instructions, but this allows better analysis via ValueTracking's matchSelectPattern().

Also, see this discussion for a larger potential optimization that would be made easier to match if we canonicalize to fcmp+select:
https://bugs.llvm.org/show_bug.cgi?id=37403#c6

There's a related codegen problem here:
https://bugs.llvm.org/show_bug.cgi?id=34149
We've already solved that for x86 at least, but this IR transform should help more generally.

Finally, I don't know of any current downside to converting this to fcmp+select, but it's another case where we should have FMF on the select:
D61917

Diff Detail

Event Timeline

spatel created this revision.May 20 2019, 1:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2019, 1:07 PM

Herald added subscribers: hiraditya, nhaehnle, wdng and 2 others. · View Herald Transcript

Typo 'nan' in the description

spatel retitled this revision from [InstCombine] canonicalize minnum/maxnum with 'nan' to fcmp+select to [InstCombine] canonicalize minnum/maxnum with 'nnan' to fcmp+select.May 20 2019, 1:37 PM

I'm not sure I see the ValueTracking advantage here. Can you add a test where this matters? I wouldn't expect matchSelectPattern to give any advantage here.

Also last I knew, the use of matchSelectPattern in SelectionDAGBuilder was somewhat buggy. It leaves behind a dead use, so in the initial combine, it breaks hasOneUse checks and prevents necessary early matching. There was also a second problem that I don't remember the details of

In D62158#1509103, @arsenm wrote:

I'm not sure I see the ValueTracking advantage here. Can you add a test where this matters? I wouldn't expect matchSelectPattern to give any advantage here.

The follow-on transform that I'm thinking about is mentioned in PR37403 - if we have an offset or some other math op on both operands of a min/max, we should be able to optimize that away:

define double @foo(double, double, double) local_unnamed_addr #0 {
  %4 = fadd fast double %2, %0
  %5 = fadd fast double %2, %1
  %6 = tail call fast double @llvm.maxnum.f64(double %4, double %5)
  ret double %6
}
-->
define double @foo1(double, double, double) local_unnamed_addr #0 {
  %3 = tail call fast double @llvm.maxnum.f64(double %0, double %1)
  %4 = fadd fast double %2, %3
  ret double %4
}

We also miss this with integer types:

define i32 @via_icmp(i32 %z, i32 %x, i32 %y) {
  %xz = add nsw i32 %x, %z   ; the adds must be 'nsw'
  %yz = add nsw i32 %y, %z
  %c = icmp slt i32 %xz, %yz
  %r = select i1 %c, i32 %xz, i32 %yz
  ret i32 %r
}

So there's potentially some overlap in how we match those patterns if we use matchSelectPattern(). Now, when I commented on that problem, we had codegen problems too, but I think we've overcome them. So I'd be fine with canonicalizing in the other direction:
fcmp+select --> minnum/maxnum
...but I think that's gated on D61917 because we need 'nsz'.

Also last I knew, the use of matchSelectPattern in SelectionDAGBuilder was somewhat buggy. It leaves behind a dead use, so in the initial combine, it breaks hasOneUse checks and prevents necessary early matching. There was also a second problem that I don't remember the details of

Ok, I didn't know there was a problem there. Do we have a bug report with an example?

In D62158#1509118, @spatel wrote:

Also last I knew, the use of matchSelectPattern in SelectionDAGBuilder was somewhat buggy. It leaves behind a dead use, so in the initial combine, it breaks hasOneUse checks and prevents necessary early matching. There was also a second problem that I don't remember the details of

Ok, I didn't know there was a problem there. Do we have a bug report with an example?

No. IIRC I ran into this during r344914. The FIXME for the regression on reduction_fast_min_pattern_v4f16 in test/CodeGen/AMDGPU/reduction.ll

spatel mentioned this in D62414: [InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics.May 24 2019, 11:34 AM

Not great with floating-point, don't think i can provide any feedback here, sorry.

cameron.mcinally added a subscriber: cameron.mcinally.Jun 21 2019, 11:09 AM

cameron.mcinally added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

2310

FYI that a draft of IEEE-754 2018 defines the (+/-)0.0 case:

minimumNumber(x, y) is x if x < y, y if y < x, and the number if one operand is a number and the other is a NaN. For this operation, −0 compares less than +0. If x = y and signs are the same it is either x or y. If both operands are NaNs, a quiet NaN is returned, according to 6.2. If either operand is a signaling NaN, an invalid operation exception is signaled, but unless both operands
are NaNs, the signaling NaN is otherwise ignored and not converted to a quiet NaN as stated in 6.2 for other operations.

lebedev.ri marked an inline comment as done.Jun 21 2019, 11:27 AM

lebedev.ri added a subscriber: lebedev.ri.

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2310	LLVM internals mostly follow LLVM LangRef, which currently does not state that; i guess you'd first want to clarify LangRef. https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic

Sorry - I should have abandoned this. There's some patch spaghetti here.

I think we will go with the more conventional approach of 'less-instructions-is-more-canonical':
D62414 - so the opposite direction of this.

That patch could get committed now because it doesn't have a direct dependency on anything, but I was hoping for a re-ruling on D63214 which was itself dependent on the now abandoned D63294.

spatel mentioned this in rL364721: [InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics.Jun 30 2019, 6:42 AM

spatel mentioned this in rG706b48251f6a: [InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

15 lines

test/

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

4 lines

maxnum.ll

15 lines

minnum.ll

15 lines

Diff 200346

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 2,297 Lines • ▼ Show 20 Lines	if (auto *M = dyn_cast<IntrinsicInst>(Arg0)) {
}		}
Instruction *NewCall = Builder.CreateBinaryIntrinsic(		Instruction *NewCall = Builder.CreateBinaryIntrinsic(
IID, X, ConstantFP::get(Arg0->getType(), Res));		IID, X, ConstantFP::get(Arg0->getType(), Res));
NewCall->copyIRFlags(II);		NewCall->copyIRFlags(II);
return replaceInstUsesWith(*II, NewCall);		return replaceInstUsesWith(*II, NewCall);
}		}
}		}

		// If there are no NaNs, convert minnum/maxnum to fcmp/select for better
		// analysis.
		if (II->hasNoNaNs() &&
		(IID == Intrinsic::minnum \|\| IID == Intrinsic::maxnum)) {
		// Minnum/maxnum have unspecified behavior comparing (+/-)0.0, so set nsz.
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions FYI that a draft of IEEE-754 2018 defines the (+/-)0.0 case: minimumNumber(x, y) is x if x < y, y if y < x, and the number if one operand is a number and the other is a NaN. For this operation, −0 compares less than +0. If x = y and signs are the same it is either x or y. If both operands are NaNs, a quiet NaN is returned, according to 6.2. If either operand is a signaling NaN, an invalid operation exception is signaled, but unless both operands are NaNs, the signaling NaN is otherwise ignored and not converted to a quiet NaN as stated in 6.2 for other operations. cameron.mcinally: FYI that a draft of IEEE-754 2018 defines the (+/-)0.0 case: ``` minimumNumber(x, y) is x if x…
		lebedev.riUnsubmitted Done Reply Inline Actions LLVM internals mostly follow LLVM LangRef, which currently does not state that; i guess you'd first want to clarify LangRef. https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic lebedev.ri: LLVM internals mostly follow LLVM LangRef, which currently does not state that; i guess you'd…
		BuilderTy::FastMathFlagGuard Guard(Builder);
		FastMathFlags FMF = II->getFastMathFlags();
		FMF.setNoSignedZeros();
		Builder.setFastMathFlags(FMF);
		FCmpInst::Predicate Pred = IID == Intrinsic::maxnum ? FCmpInst::FCMP_OGT
		: FCmpInst::FCMP_OLT;
		Value *Cmp = Builder.CreateFCmp(Pred, Arg0, Arg1);
		return SelectInst::Create(Cmp, Arg0, Arg1);
		}

break;		break;
}		}
case Intrinsic::fmuladd: {		case Intrinsic::fmuladd: {
// Canonicalize fast fmuladd to the separate fmul + fadd.		// Canonicalize fast fmuladd to the separate fmul + fadd.
if (II->isFast()) {		if (II->isFast()) {
BuilderTy::FastMathFlagGuard Guard(Builder);		BuilderTy::FastMathFlagGuard Guard(Builder);
Builder.setFastMathFlags(II->getFastMathFlags());		Builder.setFastMathFlags(II->getFastMathFlags());
Value *Mul = Builder.CreateFMul(II->getArgOperand(0),		Value *Mul = Builder.CreateFMul(II->getArgOperand(0),
▲ Show 20 Lines • Show All 2,495 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	Show First 20 Lines • Show All 1,407 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_fmf_undef_x_y_f32(float %x, float %y) {			define float @fmed3_fmf_undef_x_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_fmf_undef_x_y_f32(			; CHECK-LABEL: @fmed3_fmf_undef_x_y_f32(
	; CHECK-NEXT: [[MED3:%.]] = call nnan float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-NEXT: [[TMP1:%.]] = fcmp nnan nsz olt float [[X:%.]], [[Y:%.*]]
				; CHECK-NEXT: [[MED3:%.*]] = select i1 [[TMP1]], float [[X]], float [[Y]]
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call nnan float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)			%med3 = call nnan float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_x_undef_y_f32(float %x, float %y) {			define float @fmed3_x_undef_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_x_undef_y_f32(			; CHECK-LABEL: @fmed3_x_undef_y_f32(
	▲ Show 20 Lines • Show All 1,022 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in1, i32 4, i32 15, i32 15, i1 1)			%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in1, i32 4, i32 15, i32 15, i1 1)
	store i32 %tmp0, i32 addrspace(1)* %out			store i32 %tmp0, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; CHECK: attributes #5 = { convergent }			; CHECK: attributes #5 = { convergent }

llvm/test/Transforms/InstCombine/maxnum.ll

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
;		;
%y = call float @llvm.maxnum.f32(float %x, float 0.0)		%y = call float @llvm.maxnum.f32(float %x, float 0.0)
%z = call float @llvm.maxnum.f32(float %y, float 1.0)		%z = call float @llvm.maxnum.f32(float %y, float 1.0)
ret float %z		ret float %z
}		}

define float @maxnum_f32_1_maxnum_p0_val_fast(float %x) {		define float @maxnum_f32_1_maxnum_p0_val_fast(float %x) {
; CHECK-LABEL: @maxnum_f32_1_maxnum_p0_val_fast(		; CHECK-LABEL: @maxnum_f32_1_maxnum_p0_val_fast(
; CHECK-NEXT: [[TMP1:%.]] = call fast float @llvm.maxnum.f32(float [[X:%.]], float 1.000000e+00)		; CHECK-NEXT: [[TMP1:%.]] = fcmp fast ogt float [[X:%.]], 1.000000e+00
; CHECK-NEXT: ret float [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], float [[X]], float 1.000000e+00
		; CHECK-NEXT: ret float [[TMP2]]
;		;
%y = call float @llvm.maxnum.f32(float 0.0, float %x)		%y = call float @llvm.maxnum.f32(float 0.0, float %x)
%z = call fast float @llvm.maxnum.f32(float %y, float 1.0)		%z = call fast float @llvm.maxnum.f32(float %y, float 1.0)
ret float %z		ret float %z
}		}

define float @maxnum_f32_1_maxnum_p0_val_nnan_ninf(float %x) {		define float @maxnum_f32_1_maxnum_p0_val_nnan_ninf(float %x) {
; CHECK-LABEL: @maxnum_f32_1_maxnum_p0_val_nnan_ninf(		; CHECK-LABEL: @maxnum_f32_1_maxnum_p0_val_nnan_ninf(
; CHECK-NEXT: [[TMP1:%.]] = call nnan ninf float @llvm.maxnum.f32(float [[X:%.]], float 1.000000e+00)		; CHECK-NEXT: [[TMP1:%.]] = fcmp nnan ninf nsz ogt float [[X:%.]], 1.000000e+00
; CHECK-NEXT: ret float [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], float [[X]], float 1.000000e+00
		; CHECK-NEXT: ret float [[TMP2]]
;		;
%y = call float @llvm.maxnum.f32(float 0.0, float %x)		%y = call float @llvm.maxnum.f32(float 0.0, float %x)
%z = call nnan ninf float @llvm.maxnum.f32(float %y, float 1.0)		%z = call nnan ninf float @llvm.maxnum.f32(float %y, float 1.0)
ret float %z		ret float %z
}		}

define float @maxnum_f32_p0_maxnum_val_n0(float %x) {		define float @maxnum_f32_p0_maxnum_val_n0(float %x) {
; CHECK-LABEL: @maxnum_f32_p0_maxnum_val_n0(		; CHECK-LABEL: @maxnum_f32_p0_maxnum_val_n0(
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%r = call <2 x float> @llvm.maxnum.v2f32(<2 x float> %negx, <2 x float> %negy)		%r = call <2 x float> @llvm.maxnum.v2f32(<2 x float> %negx, <2 x float> %negy)
ret <2 x float> %r		ret <2 x float> %r
}		}

; FMF is not required, but it should be propagated from the intrinsic (not the fnegs).		; FMF is not required, but it should be propagated from the intrinsic (not the fnegs).

define float @neg_neg_vec_fmf(float %x, float %y) {		define float @neg_neg_vec_fmf(float %x, float %y) {
; CHECK-LABEL: @neg_neg_vec_fmf(		; CHECK-LABEL: @neg_neg_vec_fmf(
; CHECK-NEXT: [[TMP1:%.]] = call fast float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = fcmp fast olt float [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[R:%.*]] = fsub fast float -0.000000e+00, [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], float [[X]], float [[Y]]
		; CHECK-NEXT: [[R:%.*]] = fsub fast float -0.000000e+00, [[TMP2]]
; CHECK-NEXT: ret float [[R]]		; CHECK-NEXT: ret float [[R]]
;		;
%negx = fsub arcp float -0.0, %x		%negx = fsub arcp float -0.0, %x
%negy = fsub afn float -0.0, %y		%negy = fsub afn float -0.0, %y
%r = call fast float @llvm.maxnum.f32(float %negx, float %negy)		%r = call fast float @llvm.maxnum.f32(float %negx, float %negy)
ret float %r		ret float %r
}		}

▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/minnum.ll

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
;		;
%y = call float @llvm.minnum.f32(float %x, float 0.0)		%y = call float @llvm.minnum.f32(float %x, float 0.0)
%z = call float @llvm.minnum.f32(float %y, float 1.0)		%z = call float @llvm.minnum.f32(float %y, float 1.0)
ret float %z		ret float %z
}		}

define float @minnum_f32_1_minnum_p0_val_fast(float %x) {		define float @minnum_f32_1_minnum_p0_val_fast(float %x) {
; CHECK-LABEL: @minnum_f32_1_minnum_p0_val_fast(		; CHECK-LABEL: @minnum_f32_1_minnum_p0_val_fast(
; CHECK-NEXT: [[TMP1:%.]] = call fast float @llvm.minnum.f32(float [[X:%.]], float 0.000000e+00)		; CHECK-NEXT: [[TMP1:%.]] = fcmp fast olt float [[X:%.]], 0.000000e+00
; CHECK-NEXT: ret float [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], float [[X]], float 0.000000e+00
		; CHECK-NEXT: ret float [[TMP2]]
;		;
%y = call float @llvm.minnum.f32(float 0.0, float %x)		%y = call float @llvm.minnum.f32(float 0.0, float %x)
%z = call fast float @llvm.minnum.f32(float %y, float 1.0)		%z = call fast float @llvm.minnum.f32(float %y, float 1.0)
ret float %z		ret float %z
}		}

define float @minnum_f32_1_minnum_p0_val_nnan_ninf(float %x) {		define float @minnum_f32_1_minnum_p0_val_nnan_ninf(float %x) {
; CHECK-LABEL: @minnum_f32_1_minnum_p0_val_nnan_ninf(		; CHECK-LABEL: @minnum_f32_1_minnum_p0_val_nnan_ninf(
; CHECK-NEXT: [[TMP1:%.]] = call nnan ninf float @llvm.minnum.f32(float [[X:%.]], float 0.000000e+00)		; CHECK-NEXT: [[TMP1:%.]] = fcmp nnan ninf nsz olt float [[X:%.]], 0.000000e+00
; CHECK-NEXT: ret float [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], float [[X]], float 0.000000e+00
		; CHECK-NEXT: ret float [[TMP2]]
;		;
%y = call float @llvm.minnum.f32(float 0.0, float %x)		%y = call float @llvm.minnum.f32(float 0.0, float %x)
%z = call nnan ninf float @llvm.minnum.f32(float %y, float 1.0)		%z = call nnan ninf float @llvm.minnum.f32(float %y, float 1.0)
ret float %z		ret float %z
}		}

define float @minnum_f32_p0_minnum_val_n0(float %x) {		define float @minnum_f32_p0_minnum_val_n0(float %x) {
; CHECK-LABEL: @minnum_f32_p0_minnum_val_n0(		; CHECK-LABEL: @minnum_f32_p0_minnum_val_n0(
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	;
ret double %r		ret double %r
}		}

; FMF is not required, but it should be propagated from the intrinsic (not the fnegs).		; FMF is not required, but it should be propagated from the intrinsic (not the fnegs).
; Also, make sure this works with vectors.		; Also, make sure this works with vectors.

define <2 x double> @neg_neg_vec_fmf(<2 x double> %x, <2 x double> %y) {		define <2 x double> @neg_neg_vec_fmf(<2 x double> %x, <2 x double> %y) {
; CHECK-LABEL: @neg_neg_vec_fmf(		; CHECK-LABEL: @neg_neg_vec_fmf(
; CHECK-NEXT: [[TMP1:%.]] = call nnan ninf <2 x double> @llvm.maxnum.v2f64(<2 x double> [[X:%.]], <2 x double> [[Y:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = fcmp nnan ninf nsz ogt <2 x double> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[R:%.*]] = fsub nnan ninf <2 x double> <double -0.000000e+00, double -0.000000e+00>, [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x double> [[X]], <2 x double> [[Y]]
		; CHECK-NEXT: [[R:%.*]] = fsub nnan ninf <2 x double> <double -0.000000e+00, double -0.000000e+00>, [[TMP2]]
; CHECK-NEXT: ret <2 x double> [[R]]		; CHECK-NEXT: ret <2 x double> [[R]]
;		;
%negx = fsub reassoc <2 x double> <double -0.0, double -0.0>, %x		%negx = fsub reassoc <2 x double> <double -0.0, double -0.0>, %x
%negy = fsub fast <2 x double> <double -0.0, double -0.0>, %y		%negy = fsub fast <2 x double> <double -0.0, double -0.0>, %y
%r = call nnan ninf <2 x double> @llvm.minnum.v2f64(<2 x double> %negx, <2 x double> %negy)		%r = call nnan ninf <2 x double> @llvm.minnum.v2f64(<2 x double> %negx, <2 x double> %negy)
ret <2 x double> %r		ret <2 x double> %r
}		}

▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize minnum/maxnum with 'nnan' to fcmp+selectAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 200346

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

llvm/test/Transforms/InstCombine/maxnum.ll

llvm/test/Transforms/InstCombine/minnum.ll

[InstCombine] canonicalize minnum/maxnum with 'nnan' to fcmp+select
AbandonedPublic