Page MenuHomePhabricator

InstCombine: 1./x >= 0. -> x >= 0.
Needs ReviewPublic

Authored by MatzeB on Feb 2 2018, 9:15 PM.

Details

Summary

This adds the following two rules when the "no infs" fast-math
flag is set:

fcmp ninf pred (fdiv ninf 1., x), 0   ->   fcmp pred x, 0
fcmp ninf pred (fdiv ninf -1., x), 0  ->   fcmp swap(pred) x, 0

To justify the first rule:

  • All of the following cases show that with or without fdiv the sign is the same and 0 does not occur.
  • fdiv 1., small number <-> + large number or +inf on overflow
  • fidv 1., -small number <-> - large number or -inf on overflow
  • fdiv 1., big number <-> + small normal or denormal number (Example: 1./FLT_MAX = 1./0x0x1.fffffep+127 = 0x1p-128)
  • fdiv 1., big number <-> - small normal or denormal number

NaN is preserved:

  • fdiv 1., nan <-> nan
  • fdiv 1., -nan <-> -nan

The following cases do not work correctly:

  • fdiv 1., 0 <-> inf
  • fdiv 1., -0 <-> -inf
  • fdiv 1., inf </-> 0
  • fdiv 1., -inf </-> -0

However having a "no inf" fast-math flag on the fcmp and the fdiv allows
us to ignore these cases.

The 2nd rule can be shown to be a variant of the first:

fcmp pred (fdiv -1., x), 0  -> fcmp pred fneg (fdiv 1., x), 0
     -> fcmp swap(pred) (fdiv 1., x), 0 -> fcmp swap(pred) x, 0

Question to reviewers: Is it correct to assume that with fcmp ninf (fdiv 1.0, x) the input to fcmp cannot be +/- infinity and hence x cannot be +/- 0?

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Feb 2 2018, 9:15 PM
arsenm added inline comments.Feb 6 2018, 8:02 AM
test/Transforms/InstCombine/fcmp_reciproc.ll
4

Some tests with vectors would be nice

Do we need to restrict this to 1.0 / X ? If we only care about the sign of the fdiv result and we're ruling out INF, then any constant 'C / X' should be ok? Can also handle 'X / C'?

Do we need to restrict this to 1.0 / X ? If we only care about the sign of the fdiv result and we're ruling out INF, then any constant 'C / X' should be ok? Can also handle 'X / C'?

It gets slightly harder with C/X because for large C we can underflow to zero if X is big enough. I wasn't sure how to compute the limit so went with 1.0 for now.

Do we need to restrict this to 1.0 / X ? If we only care about the sign of the fdiv result and we're ruling out INF, then any constant 'C / X' should be ok? Can also handle 'X / C'?

It gets slightly harder with C/X because for large C we can underflow to zero if X is big enough. I wasn't sure how to compute the limit so went with 1.0 for now.

Of course it's small C where we could underflow, not large C.

scanon added a comment.Feb 6 2018, 4:07 PM

Underflow or overflow doesn't change sign, so 0 < C < inf && X >= 0 --> C/X >= 0.

MatzeB added a comment.EditedFeb 6 2018, 4:12 PM

Underflow or overflow doesn't change sign, so 0 < C < inf && X >= 0 --> C/X >= 0.

It doesn't change the sign. However we have to differentiate between three cases here: negative, null (or minus null), and positive.

Underflow can change a value from positive or negative to null.
My understanding is that in case of underflow of large positive X the expression C/X <= 0 may be true while X <= 0 is not.

scanon added a comment.Feb 6 2018, 4:26 PM

Underflow or overflow doesn't change sign, so 0 < C < inf && X >= 0 --> C/X >= 0.

It doesn't change the sign. However we have to differentiate between three cases here: negative, null (or minus null), and positive.

Underflow can change a value from positive or negative to null.
My understanding is that in case of underflow of large positive X the expression C/X <= 0 may be true while X <= 0 is not.

Ah, I see what you're trying to do.

In that case, you still have trouble because even 1/x can produce zero if someone is running with flush-to-zero enabled.

spatel added a comment.Feb 7 2018, 7:35 AM

In that case, you still have trouble because even 1/x can produce zero if someone is running with flush-to-zero enabled.

IIUC, we also have out-of-tree targets with no option; they always operate with FTZ behavior.

I think it's still possible to allow this kind of transform in instcombine with more fast-math-flags. Clang/gcc's -fassociative-math translates indirectly to 'reassoc' in IR FMF and says it may "reorder floating-point comparisons".

scanon added a comment.Feb 7 2018, 8:35 AM

In that case, you still have trouble because even 1/x can produce zero if someone is running with flush-to-zero enabled.

IIUC, we also have out-of-tree targets with no option; they always operate with FTZ behavior.

I think it's still possible to allow this kind of transform in instcombine with more fast-math-flags. Clang/gcc's -fassociative-math translates indirectly to 'reassoc' in IR FMF and says it may "reorder floating-point comparisons".

Correct. This falls in the pile of "things we could optimize if we modeled fenv".