This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineCompares.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1
fcmp_reciproc.ll

Differential D42879

InstCombine: 1./x >= 0. -> x >= 0.
AbandonedPublic

Authored by MatzeB on Feb 2 2018, 9:15 PM.

Download Raw Diff

Details

Reviewers

majnemer
spatel
scanon
efriedma
hfinkel
arsenm
craig.topper
nlopes

Summary

This adds the following two rules when the "no infs" fast-math
flag is set:

fcmp ninf pred (fdiv ninf 1., x), 0   ->   fcmp pred x, 0
fcmp ninf pred (fdiv ninf -1., x), 0  ->   fcmp swap(pred) x, 0

To justify the first rule:

All of the following cases show that with or without fdiv the sign is the same and 0 does not occur.
fdiv 1., small number <-> + large number or +inf on overflow
fidv 1., -small number <-> - large number or -inf on overflow
fdiv 1., big number <-> + small normal or denormal number (Example: 1./FLT_MAX = 1./0x0x1.fffffep+127 = 0x1p-128)
fdiv 1., big number <-> - small normal or denormal number

NaN is preserved:

fdiv 1., nan <-> nan
fdiv 1., -nan <-> -nan

The following cases do not work correctly:

fdiv 1., 0 <-> inf
fdiv 1., -0 <-> -inf
fdiv 1., inf </-> 0
fdiv 1., -inf </-> -0

However having a "no inf" fast-math flag on the fcmp and the fdiv allows
us to ignore these cases.

The 2nd rule can be shown to be a variant of the first:

fcmp pred (fdiv -1., x), 0  -> fcmp pred fneg (fdiv 1., x), 0
     -> fcmp swap(pred) (fdiv 1., x), 0 -> fcmp swap(pred) x, 0

Question to reviewers: Is it correct to assume that with fcmp ninf (fdiv 1.0, x) the input to fcmp cannot be +/- infinity and hence x cannot be +/- 0?

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB created this revision.Feb 2 2018, 9:15 PM

Herald added subscribers: wdng, mcrosier. · View Herald TranscriptFeb 2 2018, 9:15 PM

MatzeB added a reviewer: scanon.Feb 2 2018, 9:15 PM

arsenm added inline comments.Feb 6 2018, 8:02 AM

test/Transforms/InstCombine/fcmp_reciproc.ll
4	Some tests with vectors would be nice

Do we need to restrict this to 1.0 / X ? If we only care about the sign of the fdiv result and we're ruling out INF, then any constant 'C / X' should be ok? Can also handle 'X / C'?

In D42879#999160, @spatel wrote:

Do we need to restrict this to 1.0 / X ? If we only care about the sign of the fdiv result and we're ruling out INF, then any constant 'C / X' should be ok? Can also handle 'X / C'?

It gets slightly harder with C/X because for large C we can underflow to zero if X is big enough. I wasn't sure how to compute the limit so went with 1.0 for now.

In D42879#999308, @MatzeB wrote:

In D42879#999160, @spatel wrote:

Do we need to restrict this to 1.0 / X ? If we only care about the sign of the fdiv result and we're ruling out INF, then any constant 'C / X' should be ok? Can also handle 'X / C'?

It gets slightly harder with C/X because for large C we can underflow to zero if X is big enough. I wasn't sure how to compute the limit so went with 1.0 for now.

Of course it's small C where we could underflow, not large C.

Underflow or overflow doesn't change sign, so 0 < C < inf && X >= 0 --> C/X >= 0.

In D42879#999811, @scanon wrote:

Underflow or overflow doesn't change sign, so 0 < C < inf && X >= 0 --> C/X >= 0.

It doesn't change the sign. However we have to differentiate between three cases here: negative, null (or minus null), and positive.

Underflow can change a value from positive or negative to null.
My understanding is that in case of underflow of large positive X the expression C/X <= 0 may be true while X <= 0 is not.

In D42879#999833, @MatzeB wrote:

In D42879#999811, @scanon wrote:

Underflow or overflow doesn't change sign, so 0 < C < inf && X >= 0 --> C/X >= 0.

It doesn't change the sign. However we have to differentiate between three cases here: negative, null (or minus null), and positive.

Underflow can change a value from positive or negative to null.
My understanding is that in case of underflow of large positive X the expression C/X <= 0 may be true while X <= 0 is not.

Ah, I see what you're trying to do.

In that case, you still have trouble because even 1/x can produce zero if someone is running with flush-to-zero enabled.

In D42879#999858, @scanon wrote:

In that case, you still have trouble because even 1/x can produce zero if someone is running with flush-to-zero enabled.

IIUC, we also have out-of-tree targets with no option; they always operate with FTZ behavior.

I think it's still possible to allow this kind of transform in instcombine with more fast-math-flags. Clang/gcc's -fassociative-math translates indirectly to 'reassoc' in IR FMF and says it may "reorder floating-point comparisons".

In D42879#1000576, @spatel wrote:

In D42879#999858, @scanon wrote:

In that case, you still have trouble because even 1/x can produce zero if someone is running with flush-to-zero enabled.

IIUC, we also have out-of-tree targets with no option; they always operate with FTZ behavior.

I think it's still possible to allow this kind of transform in instcombine with more fast-math-flags. Clang/gcc's -fassociative-math translates indirectly to 'reassoc' in IR FMF and says it may "reorder floating-point comparisons".

Correct. This falls in the pile of "things we could optimize if we modeled fenv".

arsenm resigned from this revision.Feb 21 2019, 5:47 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 5:47 PM

nlopes resigned from this revision.Mar 2 2021, 9:55 AM

craig.topper resigned from this revision.Mar 2 2021, 6:21 PM

MatzeB abandoned this revision.Sep 10 2021, 10:15 AM

Herald added a subscriber: wenlei. · View Herald TranscriptSep 10 2021, 10:15 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

18 lines

test/

Transforms/

InstCombine/

fcmp_reciproc.ll

212 lines

Diff 132725

lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 4,991 Lines • ▼ Show 20 Lines	if (Instruction *LHSI = dyn_cast<Instruction>(Op0))
case Instruction::FSub: {		case Instruction::FSub: {
// fcmp pred (fneg x), C -> fcmp swap(pred) x, -C		// fcmp pred (fneg x), C -> fcmp swap(pred) x, -C
Value *Op;		Value *Op;
if (match(LHSI, m_FNeg(m_Value(Op))))		if (match(LHSI, m_FNeg(m_Value(Op))))
return new FCmpInst(I.getSwappedPredicate(), Op,		return new FCmpInst(I.getSwappedPredicate(), Op,
ConstantExpr::getFNeg(RHSC));		ConstantExpr::getFNeg(RHSC));
break;		break;
}		}
		case Instruction::FDiv: {
		if (I.getFastMathFlags().noInfs() &&
		LHSI->getFastMathFlags().noInfs() && RHSC->isZeroValue()) {
		Value *Op;
		const APFloat *C;
		if (match(LHSI, m_FDiv(m_APFloat(C), m_Value(Op)))) {
		// fcmp ninf pred (fdiv ninf 1.0, x), 0 -> fcmp pred x, 0
		if (C->isExactlyValue(1.0))
		return new FCmpInst(I.getPredicate(), Op,
		ConstantFP::getNullValue(LHSI->getType()));
		// fcmp ninf pred (fdiv ninf -1.0, x), 0 -> fcmp swap(pred) x, 0
		if (C->isExactlyValue(-1.0))
		return new FCmpInst(I.getSwappedPredicate(), Op,
		ConstantFP::getNullValue(LHSI->getType()));
		}
		}
		break;
		}
case Instruction::Load:		case Instruction::Load:
if (GetElementPtrInst *GEP =		if (GetElementPtrInst *GEP =
dyn_cast<GetElementPtrInst>(LHSI->getOperand(0))) {		dyn_cast<GetElementPtrInst>(LHSI->getOperand(0))) {
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getOperand(0)))		if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getOperand(0)))
if (GV->isConstant() && GV->hasDefinitiveInitializer() &&		if (GV->isConstant() && GV->hasDefinitiveInitializer() &&
!cast<LoadInst>(LHSI)->isVolatile())		!cast<LoadInst>(LHSI)->isVolatile())
if (Instruction *Res = foldCmpLoadFromIndexedGlobal(GEP, GV, I))		if (Instruction *Res = foldCmpLoadFromIndexedGlobal(GEP, GV, I))
return Res;		return Res;
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

test/Transforms/InstCombine/fcmp_reciproc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s

				define i1 @cmp_olt_recip(double %x) {
				arsenmUnsubmitted Not Done Reply Inline Actions Some tests with vectors would be nice arsenm: Some tests with vectors would be nice
				; CHECK-LABEL: @cmp_olt_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp olt double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf olt double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_ole_recip(double %x) {
				; CHECK-LABEL: @cmp_ole_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ole double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf ole double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_ogt_recip(double %x) {
				; CHECK-LABEL: @cmp_ogt_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ogt double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf ogt double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_oge_recip(double %x) {
				; CHECK-LABEL: @cmp_oge_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp oge double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf oge double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_oeq_recip(double %x) {
				; CHECK-LABEL: @cmp_oeq_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp oeq double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf oeq double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_une_recip(double %x) {
				; CHECK-LABEL: @cmp_une_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp une double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf une double %div, 0.0
				ret i1 %cmp
				}

				; Could test all 16 fcmp variants, but they're really all the same...

				define i1 @cmp_olt_neg_recip(double %x) {
				; CHECK-LABEL: @cmp_olt_neg_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ogt double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double -1.0, %x
				%cmp = fcmp ninf olt double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_ole_neg_recip(double %x) {
				; CHECK-LABEL: @cmp_ole_neg_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp oge double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double -1.0, %x
				%cmp = fcmp ninf ole double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_ogt_neg_recip(double %x) {
				; CHECK-LABEL: @cmp_ogt_neg_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp olt double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double -1.0, %x
				%cmp = fcmp ninf ogt double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_oge_neg_recip(double %x) {
				; CHECK-LABEL: @cmp_oge_neg_recip(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ole double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double -1.0, %x
				%cmp = fcmp ninf oge double %div, 0.0
				ret i1 %cmp
				}

				define i1 @cmp_olt_recip_sz(double %x) {
				; CHECK-LABEL: @cmp_olt_recip_sz(
				; CHECK-NEXT: [[CMP:%.]] = fcmp olt double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf olt double %div, -0.0
				ret i1 %cmp
				}

				define i1 @cmp_ole_recip_sz(double %x) {
				; CHECK-LABEL: @cmp_ole_recip_sz(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ole double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf ole double %div, -0.0
				ret i1 %cmp
				}

				define i1 @cmp_ogt_recip_sz(double %x) {
				; CHECK-LABEL: @cmp_ogt_recip_sz(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ogt double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf ogt double %div, -0.0
				ret i1 %cmp
				}

				define i1 @cmp_oge_recip_sz(double %x) {
				; CHECK-LABEL: @cmp_oge_recip_sz(
				; CHECK-NEXT: [[CMP:%.]] = fcmp oge double [[X:%.]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf oge double %div, -0.0
				ret i1 %cmp
				}



				define i1 @noopt0(double %x) {
				; CHECK-LABEL: @noopt0(
				; CHECK-NEXT: [[DIV:%.]] = fdiv ninf double 1.100000e+00, [[X:%.]]
				; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf oeq double [[DIV]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.1, %x
				%cmp = fcmp ninf oeq double %div, 0.0
				ret i1 %cmp
				}

				define i1 @noopt1(double %x) {
				; CHECK-LABEL: @noopt1(
				; CHECK-NEXT: [[DIV:%.]] = fdiv ninf double 1.000000e+00, [[X:%.]]
				; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf oeq double [[DIV]], 1.000000e-01
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp ninf oeq double %div, 0.1
				ret i1 %cmp
				}

				define i1 @noopt2(double %x) {
				; CHECK-LABEL: @noopt2(
				; CHECK-NEXT: [[DIV:%.]] = fdiv double 1.000000e+00, [[X:%.]]
				; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf oeq double [[DIV]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv double 1.0, %x
				%cmp = fcmp ninf oeq double %div, 0.0
				ret i1 %cmp
				}

				define i1 @noopt3(double %x) {
				; CHECK-LABEL: @noopt3(
				; CHECK-NEXT: [[DIV:%.]] = fdiv ninf double 1.000000e+00, [[X:%.]]
				; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq double [[DIV]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp oeq double %div, 0.0
				ret i1 %cmp
				}

				define i1 @noopt4(double %x) {
				; CHECK-LABEL: @noopt4(
				; CHECK-NEXT: [[DIV:%.]] = fdiv ninf double 1.000000e+00, [[X:%.]]
				; CHECK-NEXT: [[CMP:%.*]] = fcmp nnan oeq double [[DIV]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv ninf double 1.0, %x
				%cmp = fcmp nnan oeq double %div, 0.0
				ret i1 %cmp
				}

				define i1 @noopt5(double %x) {
				; CHECK-LABEL: @noopt5(
				; CHECK-NEXT: [[DIV:%.]] = fdiv nnan double 1.000000e+00, [[X:%.]]
				; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf oeq double [[DIV]], 0.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%div = fdiv nnan double 1.0, %x
				%cmp = fcmp ninf oeq double %div, 0.0
				ret i1 %cmp
				}