This is an archive of the discontinued LLVM Phabricator instance.

Differential D51942

[InstCombine] Fold (C/x)>0 into x>0 if possible
ClosedPublic

Authored by marels on Sep 11 2018, 10:43 AM.

Download Raw Diff

Details

Reviewers

john.brawn
majnemer
spatel

Commits

rGc3f50ff92e0f: [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)
rL343228: [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)

Summary

This patch adds a simplification for floating point compares that
depend on the inversion of a variable. The comparison can be folded
such that it solely depends on the variable rather than the division.

E.g.:

foo(double x) {
  double a = 1.0 / x;
  if (a < 0) ...
}

>

foo(double x) {
  double a = 1.0 / x;
  if (x < 0) ...
}

The following conversion is applied if C!=0 where C is a compiler
known constant:

(C/x < 0)
(C*x < 0)       ... multiply by x*x; Note: x*x > 0 (see Remark 2)
                ... divide by C
(x < 0)         ... if C > 0
(x > 0)         ... if C < 0

Remarks:

To avoid cases where x = NaN only the ordered predicates (ogt, olt, oge, ole) are folded.

The division requires the flag 'ninf' to be set. This also ensures there x!=0 can be assumed.

Assume x=0. We know C!=0 (by definition) and C!=inf ('ninf' is set). Therefore C/x=inf (IEEE754_2008.pdf 7.2 and 7.3). But, because 'ninf' is set, the compile can assume x!=0

To avoid cases where C/x == 0.0, 'ninf' has to be set for the fdiv and fcmp instruction.

To allow the multiplication by x*x 'arcp' and 'ninf' has to be set for the fdiv and fcmp instruction.

Diff Detail

Repository: rL LLVM

Event Timeline

marels created this revision.Sep 11 2018, 10:43 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptSep 11 2018, 10:43 AM

lebedev.ri added a reviewer: spatel.Sep 11 2018, 10:51 AM

LGTM. I think it may be possible to remove the hasAllowReciprocal check, because the point of that (as I understand it) is to allow converting a divide into a multiply-by-reciprocal in ways that can change the end result and I don't think that the transformation that's being done here will give a different result to the untransformed version, but I'm not certain of that and it's fine to leave it as it is.

This revision is now accepted and ready to land.Sep 18 2018, 4:51 AM

I haven't had a chance to look at this closely, but a first glance says it only works for scalars even though it should provide identical functionality for vectors.
Can you change the code to use the 'match()' API? This should substantially reduce the code and give you splat constant vector functionality for free (although there should be at least 1 test for that).

This revision now requires changes to proceed.Sep 18 2018, 7:00 AM

Thank you for the input,

I updated the code to support vectors using the match API. But before uploading I have question.

I was not able to find a way to match the following predicates with the existing API.

All elements are non-zero
All elements are positive
All elements are negative

These are important to correctly handle some cases:

The following cannot be folded because the new predicate is ambiguous.

C = <2 x float> <float 1.0, float -1.0>

The following cannot be folded because the one element violates the assumption that C=0.

C = <2 x float> <float 1.0, float 0.0>

This is not a big thing because it can be easily added to to PatternMatch.h (same implementation as m_AnyZeroFP). I think it is better submit a separate change for this. Do you agree, or shall I bunch both changes together?

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

In D51942#1245429, @marels wrote:
Thank you for the input,

I updated the code to support vectors using the match API. But before uploading I have question.

I was not able to find a way to match the following predicates with the existing API.

All elements are non-zero

All elements are positive

All elements are negative

These are important to correctly handle some cases:

The following cannot be folded because the new predicate is ambiguous.
C = <2 x float> <float 1.0, float -1.0>
The following cannot be folded because the one element violates the assumption that C=0.
C = <2 x float> <float 1.0, float 0.0>
This is not a big thing because it can be easily added to to PatternMatch.h (same implementation as m_AnyZeroFP). I think it is better submit a separate change for this. Do you agree, or shall I bunch both changes together?

Sorry this wasn't clear - I was only suggesting that we handle vector splat (all constants within the vector are identical or undef) patterns in this patch. You're correct that handling arbitrary vector constants is a harder problem. The API I would use here is "m_APFloat" (it deals with splat constants internally, so you probably don't need to do anything special in the calling code for this patch).

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

That sounds correct - I don't think you don't need it here.
But that does raise the question: are you planning to generalize this transform? I see a few possible enhancements:

Handle (X / C) < 0.0 (constant is divisor rather than dividend
Handle (X * C) < 0.0 (multiplication rather than division)
Handle all of the above with non-zero compare constant: (X / C1) < C2, (C1 / X) < C2, (X * C1) < C2

xbolva00 added a subscriber: xbolva00.Sep 25 2018, 12:44 PM

In D51942#1245559, @spatel wrote:

In D51942#1245429, @marels wrote:

I was not able to find a way to match the following predicates with the existing API.

Sorry this wasn't clear - I was only suggesting that we handle vector splat (all constants within the vector are identical or undef) patterns in this patch. You're correct that handling arbitrary vector constants is a harder problem. The API I would use here is "m_APFloat" (it deals with splat constants internally, so you probably don't need to do anything special in the calling code for this patch).

I do not think it is that much harder, it just requires some API extensions. I do not like it to much to just handle splat pattern because it unnecessarily hardens the preconditions just because of a lack API and I guess more optimisations could benefit from such an extension. However, if there is no API yet, I think it is best to restrict on splat values - at least for now. Do you know? Are there any activities ongoing in extending the PatternMatching API. As a first shot I am thinking of something like

template <typename ScalarType> m_ComplexMatch(bool (match_fn*)(ScalarType &e));

where as match_fn is called on all vector elements (or on the single scalar). This generically allows more complex matchings.

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

That sounds correct - I don't think you don't need it here.

I will remove this.

But that does raise the question: are you planning to generalize this transform? I see a few possible enhancements:

I think most of those are not truly generalisations because they require different equality transformations. But they are worth to take a look into in separate patches.

Handle (X / C) < 0.0 (constant is divisor rather than dividend

Handle (X * C) < 0.0 (multiplication rather than division)

and 2. are equivalent: To remove the dot operation from the compare, the inequality has to be multiplied by C or 1/C with the predicate swapped depending on the sign of C.

Handle all of the above with non-zero compare constant: (X / C1) < C2, (C1 / X) < C2, (X * C1) < C2

ad (X / C1) < C2, (X * C1) < C2: Could be easily combined with the implementation of 1. and 2. It is only required to compute C2*C1 or C2/C1 at compile time.
ad (C1 / X) < C2: Transforming this the same way in as done in this patch (*X*X/C1) leads to the equation (with the predicate swapped depending on the sign of C1):

X < X*X*C2/C1

Assuming C1/X is used later you only add instructions.
Assuming C1/X is not used later, you trade a division by 2 multiplications. This can be beneficial but in general benefits are hard to assume during InstCombine.
When transforming differently by (e.g. *X/C1) the predicate cannot be determined anymore because the sign of X is not known to the compiler.

Changed code to use pattern matching API. When the inputs are vectors only splat vectors are considered.

The patch looks logically correct, but see inline comments for cosmetic/procedural changes.

lib/Transforms/InstCombine/InstCombineCompares.cpp
5276–5291 ↗	(On Diff #167132)	This code comment is more confusing than helpful to me. Could we say something like this instead: // When C is not 0.0 and infinities are not allowed: // (C / X) < 0.0 is a sign-bit test of X // (C / X) < 0.0 --> X < 0.0 (if C is positive) // (C / X) < 0.0 --> X > 0.0 (if C is negative, swap the predicate)
5323 ↗	(On Diff #167132)	Prefer to use the actual type here rather than auto and fix capitalization: Value *A.
5324 ↗	(On Diff #167132)	I think "auto " or "FCmpInst " is preferred: http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
test/Transforms/InstCombine/fcmp.ll
385 ↗	(On Diff #167132)	I'd prefer to continue with the style of tests as seen above here because that makes it clear exactly what is changing from test to test without extra stuff like xors. So: Make each fdiv/fcmp pair its own function. Auto-generate the complete check lines using the script mentioned in line 1 of this file. Also: Commit the tests with baseline CHECK lines before this patch, and rebase this patch so we just see the diffs.

Fixed according to comments from @spatel

Adjusted introduction comment. I left a short proof there, because i think it is necessary to show why the preconditions are required.
Fixed autos
Fixed Tests
Changed commit message to:

[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)

When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Thanks - do you have commit access?

spatel mentioned this in rL343222: [InstCombine] add tests for FP sign-bit cmp optimization with fdiv; NFC.Sep 27 2018, 7:28 AM

No, I do not think so. Can you do this for me?

In D51942#1248013, @marels wrote:

No, I do not think so. Can you do this for me?

Yes - LGTM. As you probably saw, I committed the baseline tests already. I'll rebase and commit the rest.

This revision is now accepted and ready to land.Sep 27 2018, 8:12 AM

Thanks

Closed by commit rL343228: [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0) (authored by spatel). · Explain WhySep 27 2018, 9:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

55 lines

test/

Transforms/

InstCombine/

fcmp.ll

15 lines

Diff 167337

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 5,223 Lines • ▼ Show 20 Lines	if (!RHS.isZero()) {
}		}
}		}

// Lower this FP comparison into an appropriate integer version of the		// Lower this FP comparison into an appropriate integer version of the
// comparison.		// comparison.
return new ICmpInst(Pred, LHSI->getOperand(0), RHSInt);		return new ICmpInst(Pred, LHSI->getOperand(0), RHSInt);
}		}

		/// Fold (C / X) < 0.0 --> X < 0.0 if possible. Swap predicate if necessary.
		static Instruction foldFCmpReciprocalAndZero(FCmpInst &I, Instruction LHSI,
		Constant *RHSC) {
		// When C is not 0.0 and infinities are not allowed:
		// (C / X) < 0.0 is a sign-bit test of X
		// (C / X) < 0.0 --> X < 0.0 (if C is positive)
		// (C / X) < 0.0 --> X > 0.0 (if C is negative, swap the predicate)
		//
		// Proof:
		// Multiply (C / X) < 0.0 by X * X / C.
		// - X is non zero, if it is the flag 'ninf' is violated.
		// - C defines the sign of X * X * C. Thus it also defines whether to swap
		// the predicate. C is also non zero by definition.
		//
		// Thus X * X / C is non zero and the transformation is valid. [qed]

		FCmpInst::Predicate Pred = I.getPredicate();

		// Check that predicates are valid.
		if ((Pred != FCmpInst::FCMP_OGT) && (Pred != FCmpInst::FCMP_OLT) &&
		(Pred != FCmpInst::FCMP_OGE) && (Pred != FCmpInst::FCMP_OLE))
		return nullptr;

		// Check that RHS operand is zero.
		if (!match(RHSC, m_AnyZeroFP()))
		return nullptr;

		// Check fastmath flags ('ninf').
		if (!LHSI->hasNoInfs() \|\| !I.hasNoInfs())
		return nullptr;

		// Check the properties of the dividend. It must not be zero to avoid a
		// division by zero (see Proof).
		const APFloat *C;
		if (!match(LHSI->getOperand(0), m_APFloat(C)))
		return nullptr;

		if (C->isZero())
		return nullptr;

		// Get swapped predicate if necessary.
		if (C->isNegative())
		Pred = I.getSwappedPredicate();

		// Finally emit the new fcmp.
		Value *X = LHSI->getOperand(1);
		FCmpInst *NewFCI = new FCmpInst(Pred, X, RHSC);
		NewFCI->setFastMathFlags(I.getFastMathFlags());
		return NewFCI;
		}

Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {		Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {
bool Changed = false;		bool Changed = false;

/// Orders the operands of the compare so that they are listed from most		/// Orders the operands of the compare so that they are listed from most
/// complex to least complex. This puts constants before unary operators,		/// complex to least complex. This puts constants before unary operators,
/// before binary operators.		/// before binary operators.
if (getComplexity(I.getOperand(0)) < getComplexity(I.getOperand(1))) {		if (getComplexity(I.getOperand(0)) < getComplexity(I.getOperand(1))) {
I.swapOperands();		I.swapOperands();
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	if (Instruction *LHSI = dyn_cast<Instruction>(Op0))
case Instruction::FSub: {		case Instruction::FSub: {
// fcmp pred (fneg x), C -> fcmp swap(pred) x, -C		// fcmp pred (fneg x), C -> fcmp swap(pred) x, -C
Value *Op;		Value *Op;
if (match(LHSI, m_FNeg(m_Value(Op))))		if (match(LHSI, m_FNeg(m_Value(Op))))
return new FCmpInst(I.getSwappedPredicate(), Op,		return new FCmpInst(I.getSwappedPredicate(), Op,
ConstantExpr::getFNeg(RHSC));		ConstantExpr::getFNeg(RHSC));
break;		break;
}		}
		case Instruction::FDiv:
		if (Instruction *NV = foldFCmpReciprocalAndZero(I, LHSI, RHSC))
		return NV;
		break;
case Instruction::Load:		case Instruction::Load:
if (GetElementPtrInst *GEP =		if (GetElementPtrInst *GEP =
dyn_cast<GetElementPtrInst>(LHSI->getOperand(0))) {		dyn_cast<GetElementPtrInst>(LHSI->getOperand(0))) {
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getOperand(0)))		if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getOperand(0)))
if (GV->isConstant() && GV->hasDefinitiveInitializer() &&		if (GV->isConstant() && GV->hasDefinitiveInitializer() &&
!cast<LoadInst>(LHSI)->isVolatile())		!cast<LoadInst>(LHSI)->isVolatile())
if (Instruction *Res = foldCmpLoadFromIndexedGlobal(GEP, GV, I))		if (Instruction *Res = foldCmpLoadFromIndexedGlobal(GEP, GV, I))
return Res;		return Res;
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/fcmp.ll

Show First 20 Lines • Show All 374 Lines • ▼ Show 20 Lines
;		;
%cmp = fcmp oeq float undef, undef		%cmp = fcmp oeq float undef, undef
ret i1 %cmp		ret i1 %cmp
}		}

; Can fold 1.0 / X < 0.0 --> X < 0 with ninf		; Can fold 1.0 / X < 0.0 --> X < 0 with ninf
define i1 @test20_recipX_olt_0(float %X) {		define i1 @test20_recipX_olt_0(float %X) {
; CHECK-LABEL: @test20_recipX_olt_0(		; CHECK-LABEL: @test20_recipX_olt_0(
; CHECK-NEXT: [[DIV:%.]] = fdiv ninf float 1.000000e+00, [[X:%.]]		; CHECK-NEXT: [[CMP:%.]] = fcmp ninf olt float [[X:%.]], 0.000000e+00
; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf olt float [[DIV]], 0.000000e+00
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%div = fdiv ninf float 1.0, %X		%div = fdiv ninf float 1.0, %X
%cmp = fcmp ninf olt float %div, 0.0		%cmp = fcmp ninf olt float %div, 0.0
ret i1 %cmp		ret i1 %cmp
}		}

; Can fold -2.0 / X <= 0.0 --> X >= 0 with ninf		; Can fold -2.0 / X <= 0.0 --> X >= 0 with ninf
define i1 @test21_recipX_ole_0(float %X) {		define i1 @test21_recipX_ole_0(float %X) {
; CHECK-LABEL: @test21_recipX_ole_0(		; CHECK-LABEL: @test21_recipX_ole_0(
; CHECK-NEXT: [[DIV:%.]] = fdiv ninf float -2.000000e+00, [[X:%.]]		; CHECK-NEXT: [[CMP:%.]] = fcmp ninf oge float [[X:%.]], 0.000000e+00
; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf ole float [[DIV]], 0.000000e+00
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%div = fdiv ninf float -2.0, %X		%div = fdiv ninf float -2.0, %X
%cmp = fcmp ninf ole float %div, 0.0		%cmp = fcmp ninf ole float %div, 0.0
ret i1 %cmp		ret i1 %cmp
}		}

; Can fold 2.0 / X > 0.0 --> X > 0 with ninf		; Can fold 2.0 / X > 0.0 --> X > 0 with ninf
define i1 @test22_recipX_ogt_0(float %X) {		define i1 @test22_recipX_ogt_0(float %X) {
; CHECK-LABEL: @test22_recipX_ogt_0(		; CHECK-LABEL: @test22_recipX_ogt_0(
; CHECK-NEXT: [[DIV:%.]] = fdiv ninf float 2.000000e+00, [[X:%.]]		; CHECK-NEXT: [[CMP:%.]] = fcmp ninf ogt float [[X:%.]], 0.000000e+00
; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf ogt float [[DIV]], 0.000000e+00
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%div = fdiv ninf float 2.0, %X		%div = fdiv ninf float 2.0, %X
%cmp = fcmp ninf ogt float %div, 0.0		%cmp = fcmp ninf ogt float %div, 0.0
ret i1 %cmp		ret i1 %cmp
}		}

; Can fold -1.0 / X >= 0.0 --> X <= 0 with ninf		; Can fold -1.0 / X >= 0.0 --> X <= 0 with ninf
define i1 @test23_recipX_oge_0(float %X) {		define i1 @test23_recipX_oge_0(float %X) {
; CHECK-LABEL: @test23_recipX_oge_0(		; CHECK-LABEL: @test23_recipX_oge_0(
; CHECK-NEXT: [[DIV:%.]] = fdiv ninf float -1.000000e+00, [[X:%.]]		; CHECK-NEXT: [[CMP:%.]] = fcmp ninf ole float [[X:%.]], 0.000000e+00
; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf oge float [[DIV]], 0.000000e+00
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%div = fdiv ninf float -1.0, %X		%div = fdiv ninf float -1.0, %X
%cmp = fcmp ninf oge float %div, 0.0		%cmp = fcmp ninf oge float %div, 0.0
ret i1 %cmp		ret i1 %cmp
}		}

; Do not fold 1.0 / X > 0.0 when ninf is missing		; Do not fold 1.0 / X > 0.0 when ninf is missing
Show All 30 Lines	;
%div = fdiv ninf float 2.0, %X		%div = fdiv ninf float 2.0, %X
%cmp = fcmp ninf ugt float %div, 0.0		%cmp = fcmp ninf ugt float %div, 0.0
ret i1 %cmp		ret i1 %cmp
}		}

; Fold <-1.0, -1.0> / X > <-0.0, -0.0>		; Fold <-1.0, -1.0> / X > <-0.0, -0.0>
define <2 x i1> @test27_recipX_gt_vecsplat(<2 x float> %X) {		define <2 x i1> @test27_recipX_gt_vecsplat(<2 x float> %X) {
; CHECK-LABEL: @test27_recipX_gt_vecsplat(		; CHECK-LABEL: @test27_recipX_gt_vecsplat(
; CHECK-NEXT: [[DIV:%.]] = fdiv ninf <2 x float> <float -1.000000e+00, float -1.000000e+00>, [[X:%.]]		; CHECK-NEXT: [[CMP:%.]] = fcmp ninf olt <2 x float> [[X:%.]], <float -0.000000e+00, float -0.000000e+00>
; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf ogt <2 x float> [[DIV]], <float -0.000000e+00, float -0.000000e+00>
; CHECK-NEXT: ret <2 x i1> [[CMP]]		; CHECK-NEXT: ret <2 x i1> [[CMP]]
;		;
%div = fdiv ninf <2 x float> <float -1.0, float -1.0>, %X		%div = fdiv ninf <2 x float> <float -1.0, float -1.0>, %X
%cmp = fcmp ninf ogt <2 x float> %div, <float -0.0, float -0.0>		%cmp = fcmp ninf ogt <2 x float> %div, <float -0.0, float -0.0>
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}