Page MenuHomePhabricator

[InstCombine] Fold (C/x)>0 into x>0 if possible
ClosedPublic

Authored by marels on Sep 11 2018, 10:43 AM.

Details

Summary

This patch adds a simplification for floating point compares that
depend on the inversion of a variable. The comparison can be folded
such that it solely depends on the variable rather than the division.

E.g.:

foo(double x) {
  double a = 1.0 / x;
  if (a < 0) ...
}

>

foo(double x) {
  double a = 1.0 / x;
  if (x < 0) ...
}

The following conversion is applied if C!=0 where C is a compiler
known constant:

(C/x < 0)
(C*x < 0)       ... multiply by x*x; Note: x*x > 0 (see Remark 2)
                ... divide by C
(x < 0)         ... if C > 0
(x > 0)         ... if C < 0

Remarks:

  1. To avoid cases where x = NaN only the ordered predicates (ogt, olt, oge, ole) are folded.
  1. The division requires the flag 'ninf' to be set. This also ensures there x!=0 can be assumed.

    Assume x=0. We know C!=0 (by definition) and C!=inf ('ninf' is set). Therefore C/x=inf (IEEE754_2008.pdf 7.2 and 7.3). But, because 'ninf' is set, the compile can assume x!=0
  1. To avoid cases where C/x == 0.0, 'ninf' has to be set for the fdiv and fcmp instruction.
  1. To allow the multiplication by x*x 'arcp' and 'ninf' has to be set for the fdiv and fcmp instruction.

Diff Detail

Repository
rL LLVM

Event Timeline

marels created this revision.Sep 11 2018, 10:43 AM
john.brawn accepted this revision.Sep 18 2018, 4:51 AM

LGTM. I think it may be possible to remove the hasAllowReciprocal check, because the point of that (as I understand it) is to allow converting a divide into a multiply-by-reciprocal in ways that can change the end result and I don't think that the transformation that's being done here will give a different result to the untransformed version, but I'm not certain of that and it's fine to leave it as it is.

This revision is now accepted and ready to land.Sep 18 2018, 4:51 AM
spatel requested changes to this revision.Sep 18 2018, 7:00 AM

I haven't had a chance to look at this closely, but a first glance says it only works for scalars even though it should provide identical functionality for vectors.
Can you change the code to use the 'match()' API? This should substantially reduce the code and give you splat constant vector functionality for free (although there should be at least 1 test for that).

This revision now requires changes to proceed.Sep 18 2018, 7:00 AM

Thank you for the input,

I updated the code to support vectors using the match API. But before uploading I have question.

I was not able to find a way to match the following predicates with the existing API.

  1. All elements are non-zero
  2. All elements are positive
  3. All elements are negative

These are important to correctly handle some cases:

The following cannot be folded because the new predicate is ambiguous.

C = <2 x float> <float 1.0, float -1.0>

The following cannot be folded because the one element violates the assumption that C=0.

C = <2 x float> <float 1.0, float 0.0>

This is not a big thing because it can be easily added to to PatternMatch.h (same implementation as m_AnyZeroFP). I think it is better submit a separate change for this. Do you agree, or shall I bunch both changes together?

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

Thank you for the input,

I updated the code to support vectors using the match API. But before uploading I have question.

I was not able to find a way to match the following predicates with the existing API.

  1. All elements are non-zero
  2. All elements are positive
  3. All elements are negative

    These are important to correctly handle some cases:

    The following cannot be folded because the new predicate is ambiguous. C = <2 x float> <float 1.0, float -1.0>

    The following cannot be folded because the one element violates the assumption that C=0. C = <2 x float> <float 1.0, float 0.0>

    This is not a big thing because it can be easily added to to PatternMatch.h (same implementation as m_AnyZeroFP). I think it is better submit a separate change for this. Do you agree, or shall I bunch both changes together?

Sorry this wasn't clear - I was only suggesting that we handle vector splat (all constants within the vector are identical or undef) patterns in this patch. You're correct that handling arbitrary vector constants is a harder problem. The API I would use here is "m_APFloat" (it deals with splat constants internally, so you probably don't need to do anything special in the calling code for this patch).

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

That sounds correct - I don't think you don't need it here.
But that does raise the question: are you planning to generalize this transform? I see a few possible enhancements:

  1. Handle (X / C) < 0.0 (constant is divisor rather than dividend
  2. Handle (X * C) < 0.0 (multiplication rather than division)
  3. Handle all of the above with non-zero compare constant: (X / C1) < C2, (C1 / X) < C2, (X * C1) < C2
marels added a comment.EditedSep 26 2018, 4:22 AM

I was not able to find a way to match the following predicates with the existing API.

Sorry this wasn't clear - I was only suggesting that we handle vector splat (all constants within the vector are identical or undef) patterns in this patch. You're correct that handling arbitrary vector constants is a harder problem. The API I would use here is "m_APFloat" (it deals with splat constants internally, so you probably don't need to do anything special in the calling code for this patch).

I do not think it is that much harder, it just requires some API extensions. I do not like it to much to just handle splat pattern because it unnecessarily hardens the preconditions just because of a lack API and I guess more optimisations could benefit from such an extension. However, if there is no API yet, I think it is best to restrict on splat values - at least for now. Do you know? Are there any activities ongoing in extending the PatternMatching API. As a first shot I am thinking of something like

template <typename ScalarType> m_ComplexMatch(bool (match_fn*)(ScalarType &e));

where as match_fn is called on all vector elements (or on the single scalar). This generically allows more complex matchings.

@spatel what do you think about @john.brawn suggestion to removing the hasAllowReciprocal check?

That sounds correct - I don't think you don't need it here.

I will remove this.

But that does raise the question: are you planning to generalize this transform? I see a few possible enhancements:

I think most of those are not truly generalisations because they require different equality transformations. But they are worth to take a look into in separate patches.

  1. Handle (X / C) < 0.0 (constant is divisor rather than dividend
  2. Handle (X * C) < 0.0 (multiplication rather than division)
  1. and 2. are equivalent: To remove the dot operation from the compare, the inequality has to be multiplied by C or 1/C with the predicate swapped depending on the sign of C.
  1. Handle all of the above with non-zero compare constant: (X / C1) < C2, (C1 / X) < C2, (X * C1) < C2

ad (X / C1) < C2, (X * C1) < C2: Could be easily combined with the implementation of 1. and 2. It is only required to compute C2*C1 or C2/C1 at compile time.
ad (C1 / X) < C2: Transforming this the same way in as done in this patch (*X*X/C1) leads to the equation (with the predicate swapped depending on the sign of C1):

X < X*X*C2/C1

Assuming C1/X is used later you only add instructions.
Assuming C1/X is not used later, you trade a division by 2 multiplications. This can be beneficial but in general benefits are hard to assume during InstCombine.
When transforming differently by (e.g. *X/C1) the predicate cannot be determined anymore because the sign of X is not known to the compiler.

marels updated this revision to Diff 167132.Sep 26 2018, 7:39 AM

Changed code to use pattern matching API. When the inputs are vectors only splat vectors are considered.

The patch looks logically correct, but see inline comments for cosmetic/procedural changes.

lib/Transforms/InstCombine/InstCombineCompares.cpp
5276–5291 ↗(On Diff #167132)

This code comment is more confusing than helpful to me.
Could we say something like this instead:

// When C is not 0.0 and infinities are not allowed:
// (C / X) < 0.0 is a sign-bit test of X
// (C / X) < 0.0 --> X < 0.0 (if C is positive)
// (C / X) < 0.0 --> X > 0.0 (if C is negative, swap the predicate)
5323 ↗(On Diff #167132)

Prefer to use the actual type here rather than auto and fix capitalization: Value *A.

5324 ↗(On Diff #167132)
test/Transforms/InstCombine/fcmp.ll
385 ↗(On Diff #167132)

I'd prefer to continue with the style of tests as seen above here because that makes it clear exactly what is changing from test to test without extra stuff like xors. So:

  1. Make each fdiv/fcmp pair its own function.
  2. Auto-generate the complete check lines using the script mentioned in line 1 of this file.

Also:

  1. Commit the tests with baseline CHECK lines before this patch, and rebase this patch so we just see the diffs.
marels updated this revision to Diff 167307.EditedSep 27 2018, 6:36 AM

Fixed according to comments from @spatel

  1. Adjusted introduction comment. I left a short proof there, because i think it is necessary to show why the preconditions are required.
  2. Fixed autos
  3. Fixed Tests
  4. Changed commit message to:
[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)

When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Thanks - do you have commit access?

No, I do not think so. Can you do this for me?

spatel accepted this revision.Sep 27 2018, 8:12 AM

No, I do not think so. Can you do this for me?

Yes - LGTM. As you probably saw, I committed the baseline tests already. I'll rebase and commit the rest.

This revision is now accepted and ready to land.Sep 27 2018, 8:12 AM
This revision was automatically updated to reflect the committed changes.