This is an archive of the discontinued LLVM Phabricator instance.

Differential D51942

[InstCombine] Fold (C/x)>0 into x>0 if possible
ClosedPublic

Authored by marels on Sep 11 2018, 10:43 AM.

Download Raw Diff

Details

Reviewers

john.brawn
majnemer
spatel

Commits

rGc3f50ff92e0f: [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)
rL343228: [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)

Summary

This patch adds a simplification for floating point compares that
depend on the inversion of a variable. The comparison can be folded
such that it solely depends on the variable rather than the division.

E.g.:

foo(double x) {
  double a = 1.0 / x;
  if (a < 0) ...
}

>

foo(double x) {
  double a = 1.0 / x;
  if (x < 0) ...
}

The following conversion is applied if C!=0 where C is a compiler
known constant:

(C/x < 0)
(C*x < 0)       ... multiply by x*x; Note: x*x > 0 (see Remark 2)
                ... divide by C
(x < 0)         ... if C > 0
(x > 0)         ... if C < 0

Remarks:

To avoid cases where x = NaN only the ordered predicates (ogt, olt, oge, ole) are folded.

The division requires the flag 'ninf' to be set. This also ensures there x!=0 can be assumed.

Assume x=0. We know C!=0 (by definition) and C!=inf ('ninf' is set). Therefore C/x=inf (IEEE754_2008.pdf 7.2 and 7.3). But, because 'ninf' is set, the compile can assume x!=0

To avoid cases where C/x == 0.0, 'ninf' has to be set for the fdiv and fcmp instruction.

To allow the multiplication by x*x 'arcp' and 'ninf' has to be set for the fdiv and fcmp instruction.

Diff Detail

Event Timeline

marels created this revision.Sep 11 2018, 10:43 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptSep 11 2018, 10:43 AM

lebedev.ri added a reviewer: spatel.Sep 11 2018, 10:51 AM

LGTM. I think it may be possible to remove the hasAllowReciprocal check, because the point of that (as I understand it) is to allow converting a divide into a multiply-by-reciprocal in ways that can change the end result and I don't think that the transformation that's being done here will give a different result to the untransformed version, but I'm not certain of that and it's fine to leave it as it is.

This revision is now accepted and ready to land.Sep 18 2018, 4:51 AM

I haven't had a chance to look at this closely, but a first glance says it only works for scalars even though it should provide identical functionality for vectors.
Can you change the code to use the 'match()' API? This should substantially reduce the code and give you splat constant vector functionality for free (although there should be at least 1 test for that).

This revision now requires changes to proceed.Sep 18 2018, 7:00 AM

Thank you for the input,

I updated the code to support vectors using the match API. But before uploading I have question.

I was not able to find a way to match the following predicates with the existing API.

All elements are non-zero
All elements are positive
All elements are negative

These are important to correctly handle some cases:

The following cannot be folded because the new predicate is ambiguous.

C = <2 x float> <float 1.0, float -1.0>

The following cannot be folded because the one element violates the assumption that C=0.

C = <2 x float> <float 1.0, float 0.0>

This is not a big thing because it can be easily added to to PatternMatch.h (same implementation as m_AnyZeroFP). I think it is better submit a separate change for this. Do you agree, or shall I bunch both changes together?

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

In D51942#1245429, @marels wrote:
Thank you for the input,

I updated the code to support vectors using the match API. But before uploading I have question.

I was not able to find a way to match the following predicates with the existing API.

All elements are non-zero

All elements are positive

All elements are negative

These are important to correctly handle some cases:

The following cannot be folded because the new predicate is ambiguous.
C = <2 x float> <float 1.0, float -1.0>
The following cannot be folded because the one element violates the assumption that C=0.
C = <2 x float> <float 1.0, float 0.0>
This is not a big thing because it can be easily added to to PatternMatch.h (same implementation as m_AnyZeroFP). I think it is better submit a separate change for this. Do you agree, or shall I bunch both changes together?

Sorry this wasn't clear - I was only suggesting that we handle vector splat (all constants within the vector are identical or undef) patterns in this patch. You're correct that handling arbitrary vector constants is a harder problem. The API I would use here is "m_APFloat" (it deals with splat constants internally, so you probably don't need to do anything special in the calling code for this patch).

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

That sounds correct - I don't think you don't need it here.
But that does raise the question: are you planning to generalize this transform? I see a few possible enhancements:

Handle (X / C) < 0.0 (constant is divisor rather than dividend
Handle (X * C) < 0.0 (multiplication rather than division)
Handle all of the above with non-zero compare constant: (X / C1) < C2, (C1 / X) < C2, (X * C1) < C2

xbolva00 added a subscriber: xbolva00.Sep 25 2018, 12:44 PM

In D51942#1245559, @spatel wrote:

In D51942#1245429, @marels wrote:

I was not able to find a way to match the following predicates with the existing API.

Sorry this wasn't clear - I was only suggesting that we handle vector splat (all constants within the vector are identical or undef) patterns in this patch. You're correct that handling arbitrary vector constants is a harder problem. The API I would use here is "m_APFloat" (it deals with splat constants internally, so you probably don't need to do anything special in the calling code for this patch).

I do not think it is that much harder, it just requires some API extensions. I do not like it to much to just handle splat pattern because it unnecessarily hardens the preconditions just because of a lack API and I guess more optimisations could benefit from such an extension. However, if there is no API yet, I think it is best to restrict on splat values - at least for now. Do you know? Are there any activities ongoing in extending the PatternMatching API. As a first shot I am thinking of something like

template <typename ScalarType> m_ComplexMatch(bool (match_fn*)(ScalarType &e));

where as match_fn is called on all vector elements (or on the single scalar). This generically allows more complex matchings.

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

That sounds correct - I don't think you don't need it here.

I will remove this.

But that does raise the question: are you planning to generalize this transform? I see a few possible enhancements:

I think most of those are not truly generalisations because they require different equality transformations. But they are worth to take a look into in separate patches.

Handle (X / C) < 0.0 (constant is divisor rather than dividend

Handle (X * C) < 0.0 (multiplication rather than division)

and 2. are equivalent: To remove the dot operation from the compare, the inequality has to be multiplied by C or 1/C with the predicate swapped depending on the sign of C.

Handle all of the above with non-zero compare constant: (X / C1) < C2, (C1 / X) < C2, (X * C1) < C2

ad (X / C1) < C2, (X * C1) < C2: Could be easily combined with the implementation of 1. and 2. It is only required to compute C2*C1 or C2/C1 at compile time.
ad (C1 / X) < C2: Transforming this the same way in as done in this patch (*X*X/C1) leads to the equation (with the predicate swapped depending on the sign of C1):

X < X*X*C2/C1

Assuming C1/X is used later you only add instructions.
Assuming C1/X is not used later, you trade a division by 2 multiplications. This can be beneficial but in general benefits are hard to assume during InstCombine.
When transforming differently by (e.g. *X/C1) the predicate cannot be determined anymore because the sign of X is not known to the compiler.

Changed code to use pattern matching API. When the inputs are vectors only splat vectors are considered.

The patch looks logically correct, but see inline comments for cosmetic/procedural changes.

lib/Transforms/InstCombine/InstCombineCompares.cpp
5276–5291	This code comment is more confusing than helpful to me. Could we say something like this instead: // When C is not 0.0 and infinities are not allowed: // (C / X) < 0.0 is a sign-bit test of X // (C / X) < 0.0 --> X < 0.0 (if C is positive) // (C / X) < 0.0 --> X > 0.0 (if C is negative, swap the predicate)
5323	Prefer to use the actual type here rather than auto and fix capitalization: Value *A.
5324	I think "auto " or "FCmpInst " is preferred: http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
test/Transforms/InstCombine/fcmp.ll
385	I'd prefer to continue with the style of tests as seen above here because that makes it clear exactly what is changing from test to test without extra stuff like xors. So: Make each fdiv/fcmp pair its own function. Auto-generate the complete check lines using the script mentioned in line 1 of this file. Also: Commit the tests with baseline CHECK lines before this patch, and rebase this patch so we just see the diffs.

Fixed according to comments from @spatel

Adjusted introduction comment. I left a short proof there, because i think it is necessary to show why the preconditions are required.
Fixed autos
Fixed Tests
Changed commit message to:

[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)

When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Thanks - do you have commit access?

spatel mentioned this in rL343222: [InstCombine] add tests for FP sign-bit cmp optimization with fdiv; NFC.Sep 27 2018, 7:28 AM

No, I do not think so. Can you do this for me?

In D51942#1248013, @marels wrote:

No, I do not think so. Can you do this for me?

Yes - LGTM. As you probably saw, I committed the baseline tests already. I'll rebase and commit the rest.

This revision is now accepted and ready to land.Sep 27 2018, 8:12 AM

Thanks

Closed by commit rL343228: [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0) (authored by spatel). · Explain WhySep 27 2018, 9:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

53 lines

test/

Transforms/

InstCombine/

fcmp.ll

132 lines

Diff 167132

lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 5,266 Lines • ▼ Show 20 Lines	if (Instruction *LHSI = dyn_cast<Instruction>(Op0))
case Instruction::FSub: {		case Instruction::FSub: {
// fcmp pred (fneg x), C -> fcmp swap(pred) x, -C		// fcmp pred (fneg x), C -> fcmp swap(pred) x, -C
Value *Op;		Value *Op;
if (match(LHSI, m_FNeg(m_Value(Op))))		if (match(LHSI, m_FNeg(m_Value(Op))))
return new FCmpInst(I.getSwappedPredicate(), Op,		return new FCmpInst(I.getSwappedPredicate(), Op,
ConstantExpr::getFNeg(RHSC));		ConstantExpr::getFNeg(RHSC));
break;		break;
}		}
		case Instruction::FDiv: {
		// Assume C != 0 is a constant and a and d are floating point variables.
		// 1: a != 0 ... Because a is the nominator of a division
		// this is implicitly given by the flag 'ninf'
		// 2: d = C / a
		// 3: (d < 0)
		//
		// To simplify 3: execute the following steps
		//
		// 4: (C / a < 0) ... subtitute d by C / a
		// 5: (Ca < 0) ... multiply by aa (note aa is positive for a in
		// float)
		// 6: ... divide by C
		// 7.1: (a < 0) ... if C > 0
		// 7.2: (a > 0) ... if C < 0
		//
		// This transformation works for the ordered variants of <=, <, >, >=
		spatelUnsubmitted Not Done Reply Inline Actions This code comment is more confusing than helpful to me. Could we say something like this instead: // When C is not 0.0 and infinities are not allowed: // (C / X) < 0.0 is a sign-bit test of X // (C / X) < 0.0 --> X < 0.0 (if C is positive) // (C / X) < 0.0 --> X > 0.0 (if C is negative, swap the predicate) spatel: This code comment is more confusing than helpful to me. Could we say something like this…

		// Check that predicates are valid.
		if ((Pred != FCmpInst::FCMP_OGT) && (Pred != FCmpInst::FCMP_OLT) &&
		(Pred != FCmpInst::FCMP_OGE) && (Pred != FCmpInst::FCMP_OLE))
		break;

		// Check that RHS oparand matches the from in (3:)
		if (!match(RHSC, m_AnyZeroFP()))
		break;

		// Check fastmath flags ('ninf'). This is a requirement for 1: and 5:.
		if (!LHSI->hasNoInfs() \|\| !I.hasNoInfs())
		break;

		// Check the properties of the dividend
		const APFloat *C;
		if (!match(LHSI->getOperand(0), m_APFloat(C)))
		break;

		// Division by zero is not allowed (see 6:)
		if (C->isZero())
		break;

		// Get the new predicate (see 7:)
		FCmpInst::Predicate NewPred;
		if (C->isNegative())
		NewPred = I.getSwappedPredicate();
		else
		NewPred = I.getPredicate();

		// Finally emit the new fcmp.
		auto *a = LHSI->getOperand(1);
		spatelUnsubmitted Not Done Reply Inline Actions Prefer to use the actual type here rather than auto and fix capitalization: Value A. spatel:* Prefer to use the actual type here rather than auto and fix capitalization: Value *A.
		auto NewFCI = new FCmpInst(NewPred, a, RHSC);
		spatelUnsubmitted Not Done Reply Inline Actions I think "auto " or "FCmpInst " is preferred: http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable spatel: I think "auto " or "FCmpInst " is preferred: http://llvm.org/docs/CodingStandards.html#use…
		NewFCI->setFastMathFlags(I.getFastMathFlags());
		return NewFCI;
		}
case Instruction::Load:		case Instruction::Load:
if (GetElementPtrInst *GEP =		if (GetElementPtrInst *GEP =
dyn_cast<GetElementPtrInst>(LHSI->getOperand(0))) {		dyn_cast<GetElementPtrInst>(LHSI->getOperand(0))) {
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getOperand(0)))		if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getOperand(0)))
if (GV->isConstant() && GV->hasDefinitiveInitializer() &&		if (GV->isConstant() && GV->hasDefinitiveInitializer() &&
!cast<LoadInst>(LHSI)->isVolatile())		!cast<LoadInst>(LHSI)->isVolatile())
if (Instruction *Res = foldCmpLoadFromIndexedGlobal(GEP, GV, I))		if (Instruction *Res = foldCmpLoadFromIndexedGlobal(GEP, GV, I))
return Res;		return Res;
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

test/Transforms/InstCombine/fcmp.ll

	Show First 20 Lines • Show All 371 Lines • ▼ Show 20 Lines
	define i1 @test19_undef_ordered() {			define i1 @test19_undef_ordered() {
	; CHECK-LABEL: @test19_undef_ordered(			; CHECK-LABEL: @test19_undef_ordered(
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%cmp = fcmp oeq float undef, undef			%cmp = fcmp oeq float undef, undef
	ret i1 %cmp			ret i1 %cmp
	}			}

				; Can fold with ninf and arcp
				; %2 = fdiv ninf 1.0, double %1fcmp
				; %3 = fcmp ninf oeq double %2, 0.0
				; =>
				; %3 = fcmp ninf olt double %1, 0.0
				define i1 @test20_recip(double %arg_d, float %arg_f) {
				spatelUnsubmitted Not Done Reply Inline Actions I'd prefer to continue with the style of tests as seen above here because that makes it clear exactly what is changing from test to test without extra stuff like xors. So: Make each fdiv/fcmp pair its own function. Auto-generate the complete check lines using the script mentioned in line 1 of this file. Also: Commit the tests with baseline CHECK lines before this patch, and rebase this patch so we just see the diffs. spatel: I'd prefer to continue with the style of tests as seen above here because that makes it clear…
				; CHECK-LABEL: @test20_recip(
				; CHECK-SAME: double [[AD:%.]], float [[AF:%.]])
				;
				; DoubleTy with all allowed predicates. Note: fcmp args are swapped
				;
				; CHECK: %cmp1 = fcmp ninf ogt double [[AD]], 0.000000e+00
				; CHECK: %cmp2 = fcmp ninf olt double [[AD]], 0.000000e+00
				; CHECK: %cmp3 = fcmp ninf oge double [[AD]], 0.000000e+00
				; CHECK: %cmp4 = fcmp ninf ole double [[AD]], 0.000000e+00

				%div_dp = fdiv ninf double 1.0, %arg_d

				%cmp1 = fcmp ninf olt double 0.0, %div_dp
				%cmp2 = fcmp ninf ogt double 0.0, %div_dp
				%cmp3 = fcmp ninf ole double 0.0, %div_dp
				%cmp4 = fcmp ninf oge double 0.0, %div_dp
				%res2 = xor i1 %cmp1, %cmp2
				%res3 = xor i1 %res2, %cmp3
				%res4 = xor i1 %res3, %cmp4

				; FloatTy with all allowed predicates
				;
				; CHECK: %cmp5 = fcmp ninf olt float [[AF]], 0.000000e+00
				; CHECK: %cmp6 = fcmp ninf ogt float [[AF]], 0.000000e+00
				; CHECK: %cmp7 = fcmp ninf ole float [[AF]], 0.000000e+00
				; CHECK: %cmp8 = fcmp ninf oge float [[AF]], 0.000000e+00

				%div_fp = fdiv ninf float 2.0, %arg_f

				%cmp5 = fcmp ninf olt float %div_fp, 0.0
				%cmp6 = fcmp ninf ogt float %div_fp, 0.0
				%cmp7 = fcmp ninf ole float %div_fp, 0.0
				%cmp8 = fcmp ninf oge float %div_fp, 0.0
				%res5 = xor i1 %res4, %cmp5
				%res6 = xor i1 %res5, %cmp6
				%res7 = xor i1 %res6, %cmp7
				%res8 = xor i1 %res7, %cmp8

				; Negative Denominator predicate gets inverted
				;
				; CHECK: %cmp9 = fcmp ninf olt float [[AF]], 0.000000e+00
				; CHECK: %cmp10 = fcmp ninf ogt float [[AF]], 0.000000e+00
				; CHECK: %cmp11 = fcmp ninf ole float [[AF]], 0.000000e+00
				; CHECK: %cmp12 = fcmp ninf oge float [[AF]], 0.000000e+00
				; CHECK: %cmp13 = fcmp ninf ogt double [[AD]], 0.000000e+00
				; CHECK: %cmp14 = fcmp ninf olt double [[AD]], 0.000000e+00
				; CHECK: %cmp15 = fcmp ninf oge double [[AD]], 0.000000e+00
				; CHECK: %cmp16 = fcmp ninf ole double [[AD]], 0.000000e+00

				%div_dn = fdiv ninf double -1.0, %arg_d
				%div_fn = fdiv ninf float -3.0, %arg_f

				%cmp9 = fcmp ninf olt float 0.0, %div_fn
				%cmp10 = fcmp ninf ogt float 0.0, %div_fn
				%cmp11 = fcmp ninf ole float 0.0, %div_fn
				%cmp12 = fcmp ninf oge float 0.0, %div_fn
				%cmp13 = fcmp ninf olt double %div_dn, 0.0
				%cmp14 = fcmp ninf ogt double %div_dn, 0.0
				%cmp15 = fcmp ninf ole double %div_dn, 0.0
				%cmp16 = fcmp ninf oge double %div_dn, 0.0
				%res9 = xor i1 %res8, %cmp9
				%res10 = xor i1 %res9, %cmp10
				%res11 = xor i1 %res10, %cmp11
				%res12 = xor i1 %res11, %cmp12
				%res13 = xor i1 %res12, %cmp13
				%res14 = xor i1 %res13, %cmp14
				%res15 = xor i1 %res14, %cmp15
				%res16 = xor i1 %res15, %cmp16

				; Invalid fast-math flags
				;
				%div_inv1 = fdiv ninf float %arg_f, 3.0
				%div_inv2 = fdiv float 1.0, %arg_f

				; CHECK: %cmpI0 = fcmp ninf ogt float %div_inv1, 0.000000e+00
				; CHECK: %cmpI1 = fcmp ninf ogt float %div_inv2, 0.000000e+00
				%cmpI0 = fcmp ninf olt float 0.0, %div_inv1
				%cmpI1 = fcmp ninf ogt float %div_inv2, 0.0
				%resI0 = xor i1 %res16, %cmpI0
				%resI1 = xor i1 %resI0, %cmpI1

				; Unordered predicates
				;
				; CHECK: %cmpO0 = fcmp ninf ugt float %div_fp, 0.000000e+00
				; CHECK: %cmpO1 = fcmp ninf ult float %div_fp, 0.000000e+00
				; CHECK: %cmpO2 = fcmp ninf uge float %div_fp, 0.000000e+00
				; CHECK: %cmpO3 = fcmp ninf ule float %div_fp, 0.000000e+00
				%cmpO0 = fcmp ninf ult float 0.0, %div_fp
				%cmpO1 = fcmp ninf ugt float 0.0, %div_fp
				%cmpO2 = fcmp ninf ule float 0.0, %div_fp
				%cmpO3 = fcmp ninf uge float 0.0, %div_fp
				%resO0 = xor i1 %resI1, %cmpO0
				%resO1 = xor i1 %resO0, %cmpO1
				%resO2 = xor i1 %resO1, %cmpO2
				%resO3 = xor i1 %resO2, %cmpO3

				; CHECK: ret i1 %resO3
				ret i1 %resO3
				}

				; vector tests for test20_recip
				define < 2 x i1> @test21_recip_vec(<2 x float> %arg_f) {
				; CHECK-LABEL: @test21_recip_vec(
				; CHECK-SAME: <2 x float> [[AF:%.*]])

				; CHECK: %resV0 = fcmp ninf oge <2 x float> %arg_f, zeroinitializer
				%divV0 = fdiv ninf <2 x float> <float 1.0, float 1.0>, %arg_f
				%resV0 = fcmp ninf oge <2 x float> %divV0, zeroinitializer
				; CHECK: %resV1 = fcmp ninf ole <2 x float> %arg_f, zeroinitializer
				%divV1 = fdiv ninf <2 x float> <float -1.0, float -1.0>, %arg_f
				%resV1 = fcmp ninf oge <2 x float> %divV1, zeroinitializer
				; CHECK: %resV2 = fcmp ninf oge <2 x float> %divV2, zeroinitializer
				%divV2 = fdiv ninf <2 x float> zeroinitializer, %arg_f
				%resV2 = fcmp ninf oge <2 x float> %divV2, zeroinitializer
				; CHECK: %resV3 = fcmp ninf oge <2 x float> %divV3, zeroinitializer
				%divV3 = fdiv ninf <2 x float> <float -1.0, float 1.0>, %arg_f
				%resV3 = fcmp ninf oge <2 x float> %divV3, zeroinitializer

				; prevent DCE
				%res1 = xor <2 x i1> %resV0, %resV1
				%res2 = xor <2 x i1> %res1, %resV2
				%res3 = xor <2 x i1> %res2, %resV3

				; CHECK: ret <2 x i1> %res3
				ret <2 x i1> %res3
				}