This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold Select with binary op - FP opcodes
ClosedPublic

Authored by xbolva00 on Aug 14 2018, 8:57 AM.

Details

Summary

Follow up for https://reviews.llvm.org/rL339520 and https://reviews.llvm.org/rL338300

Alive:

%A = fcmp oeq float %x, 0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %z, float %y
----------                                                                      
  %A = fcmp oeq float %x, 0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %z, float %y

Done: 1                                                                         
Optimization is correct

%A = fcmp une float %x, -0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %y, float %B
=>
%C = select i1 %A, float %y, float %z
----------                                                                      
  %A = fcmp une float %x, -0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %y, float %B
=>
  %C = select %A, float %y, float %z

Done: 1                                                                         
Optimization is correct

Diff Detail

Repository
rL LLVM

Event Timeline

xbolva00 created this revision.Aug 14 2018, 8:57 AM
xbolva00 added inline comments.Aug 14 2018, 9:01 AM
test/Transforms/InstCombine/select-binop-cmp.ll
201 ↗(On Diff #160580)

not sure why select_fsub_fcmp is not folded.

xbolva00 marked an inline comment as done.Aug 14 2018, 10:07 AM
spatel added inline comments.Aug 14 2018, 10:54 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
71 ↗(On Diff #160625)

Did you step through the scenarios where X is NAN? I don't think the transform is valid for both predicates (similarly for ONE/UNE).

xbolva00 added inline comments.Aug 14 2018, 11:04 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
71 ↗(On Diff #160625)

Ok, so probably not worth to with FP opcodes? if it is so complicated.. Alive has some problems with float?

spatel added inline comments.
lib/Transforms/InstCombine/InstCombineSelect.cpp
71 ↗(On Diff #160625)

AFAIK, Alive doesn't support FP yet.
I think this is still a good optimization for FP, we just have to do it carefully. So that just means prove that the output is correct with a NAN input and one or more of the FP predicates.

xbolva00 added inline comments.Aug 18 2018, 8:55 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
71 ↗(On Diff #160625)

How can we prove it?

I will add many tests with nan tomorrow so we can see changes directly.

xbolva00 updated this revision to Diff 161394.Aug 19 2018, 5:01 AM
  • Tests with nan
spatel added inline comments.Aug 19 2018, 5:38 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
71 ↗(On Diff #160625)

Options:

  1. Use an FP-capable version of Alive - https://github.com/rutgers-apl/alive-nj (I have no experience with that).
  1. Substitute %x with NaN in the basic pattern:
%A = fcmp oeq float %x, -0.0
%B = fadd float %x, %z
%C = select i1 %A, float %B, float %y

And see if the result is the same after the transform by working through the code by hand.

  1. Run this IR under 'lli'? (again, I'm not very familiar with that tool)
  1. Create a C program that would produce the IR patterns and step/printf through to see if the output is the same given a NaN input.
test/Transforms/InstCombine/select-binop-cmp.ll
361 ↗(On Diff #161394)

These tests don't do what we need. If we're using a NaN constant (rather than a variable that is dynamically set to NaN), other optimizations will prevent the transform in question.

Ok, I create this IR manually + lli to interpret it
https://pastebin.com/MxdSFyTL

xbolva00@xbolva00:~/opt$ lli fp.ll 5 6
noopt 2.000000
fpoeq 2.000000
fpueq 2.000000
noopt 2.000000
fpoeq 2.000000
fpueq 2.000000
noopt nan
fpoeq nan
fpueq 2.000000
noopt nan
fpoeq nan
fpueq nan


So OEQ seems good.

Next steps:

  1. Remove UEQ from optimization
  2. Change tests + add nan tests (No idea yet how to do proper ones, small tip?)
xbolva00 added a comment.EditedAug 19 2018, 8:54 AM

Alive says that we cannot transform it. OEQ maybe yes with the check if zero has no sign.

Alive: https://pastebin.com/TkJB4RTr

Alive says that we cannot transform it. OEQ maybe yes with the check if zero has no sign.

Alive: https://pastebin.com/TkJB4RTr

Great! You got alive-nj working.

So yes, looks like we have to check for 'nsz' and only OEQ (but not UEQ). For the tests, there's no easy way to encode what we're showing here AFAIK. We just need to make sure that we have both positive and negative tests when 'nsz' is (not) there and for each possible predicate.

xbolva00 edited the summary of this revision. (Show Details)Aug 19 2018, 10:37 AM
xbolva00 edited the summary of this revision. (Show Details)
xbolva00 updated this revision to Diff 161465.Aug 20 2018, 5:47 AM
  • New tests
spatel added inline comments.Aug 20 2018, 9:07 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
85 ↗(On Diff #161465)

I think this is a correct predicate, but stronger than necessary. Does alive-fp say we can get away with CannotBeNegativeZero(Y)? If it doesn't recognize that function, you could fake it by adding another operation that creates 'Y' (or '%z' as shown in the test names) while guaranteeing that it is not -0.0. So something like:
%z = fadd %v, 0.0 (this can never produce -0.0)

xbolva00 added inline comments.Aug 20 2018, 10:42 PM
lib/Transforms/InstCombine/InstCombineSelect.cpp
85 ↗(On Diff #161465)
%A = fcmp oeq float %x, 0.0
%z = fadd %v, 0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %z, float %y
----------                                                                      
  %A = fcmp oeq float %x, 0.0
  %z = fadd %v, 0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %z, float %y

Done: 1                                                                         
Optimization is correct
%A = fcmp oeq float %x, 0.0
%z = fadd %v, 0.0
%B = fadd float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %z, float %y
----------                                                                      
  %A = fcmp oeq float %x, 0.0
  %z = fadd %v, 0.0
  %B = fadd float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %z, float %y

Done: 1                                                                         
Optimization is correct
85 ↗(On Diff #161465)

So we can use

ValueTracker::CannotBeNegativeZero ?

xbolva00 updated this revision to Diff 161645.Aug 20 2018, 11:15 PM
  • Less restrictive checks + new tests
xbolva00 updated this revision to Diff 161646.Aug 20 2018, 11:22 PM
xbolva00 marked an inline comment as done.
xbolva00 updated this revision to Diff 161647.Aug 20 2018, 11:27 PM
  • One more test with ConstantFP
spatel added inline comments.Aug 21 2018, 5:22 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
72–73 ↗(On Diff #161647)

Did you try testing ONE/UNE in alive-fp?

91 ↗(On Diff #161647)

This comment should explain, not just repeat the code:
+0.0 compares equal to -0.0, and so it does not behave as required for this transform. Bail out if we can not exclude that possibility.

92 ↗(On Diff #161647)

I don't think you need to have these checks? But I suppose checking for FPMathOperator is a good avoidance for calling CannotBeNegativeZero for the common (integer) case.

xbolva00 added inline comments.Aug 21 2018, 7:43 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
72–73 ↗(On Diff #161647)
%A = fcmp one float %x, 0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %B, float %z
----------                                                                      
  %A = fcmp one float %x, 0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %B, float %z

ERROR: Mismatch in values for float %C                                          

Example:
float %x = NaN
   i1 %A = 0x0 (0)
float %z = 0.0640411376953125*(2**-126)
float %B = NaN
float %y = 1.00048828125*(2**1)
source: 1.00048828125*(2**1)
target: 0.0640411376953125*(2**-126)

xbolva00@xbolva00:~/alive-nj$ ./run.py                               
[Reading from terminal...]
%A = fcmp une float %x, 0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %B, float %z
----------                                                                      
  %A = fcmp une float %x, 0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %B, float %z

ERROR: Mismatch in values for float %C                                          

Example:
float %x = = +0.0
   i1 %A = 0x0 (0)
float %z = NaN
float %B = NaN
float %y = 0.0000002384185791015625*(2**-126)
source: 0.0000002384185791015625*(2**-126)
target: NaN
spatel added inline comments.Aug 21 2018, 7:54 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
72–73 ↗(On Diff #161647)

I think you want these tests to use -0.0 because that's the identity constant for fadd.

xbolva00 updated this revision to Diff 161721.Aug 21 2018, 7:54 AM
  • Drop ONE
  • Fixed comment/code
xbolva00 added inline comments.Aug 21 2018, 7:58 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
72–73 ↗(On Diff #161647)
%A = fcmp one float %x, -0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %B, float %z
----------                                                                      
  %A = fcmp one float %x, -0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %B, float %z

ERROR: Mismatch in values for float %C                                          

Example:
float %x = +0.0
   i1 %A = 0x0 (0)
float %z = -0.504963397979736328125*(2**-126)
float %B = -0.504963397979736328125*(2**-126)
float %y = 1.5*(2**-125)
source: 1.5*(2**-125)
target: -0.504963397979736328125*(2**-126)

%A = fcmp une float %x, -0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %B, float %z
----------                                                                      
  %A = fcmp une float %x, -0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %B, float %z

ERROR: Mismatch in values for float %C                                          

Example:
float %x = +0.0
   i1 %A = 0x0 (0)
float %z = +oo
float %B = +oo
float %y = -0.03125*(2**-126)
source: -0.03125*(2**-126)
target: +oo
xbolva00 added inline comments.Aug 21 2018, 8:17 AM
lib/Transforms/InstCombine/InstCombineSelect.cpp
72–73 ↗(On Diff #161647)

Ok.

%A = fcmp une float %x, -0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %y, float %B
=>
%C = select i1 %A, float %y, float %z
----------                                                                      
  %A = fcmp une float %x, -0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %y, float %B
=>
  %C = select %A, float %y, float %z

Done: 1                                                                         
Optimization is correct
xbolva00 edited the summary of this revision. (Show Details)Aug 21 2018, 8:17 AM
xbolva00 updated this revision to Diff 161729.Aug 21 2018, 8:19 AM
  • Support UNE
xbolva00 updated this revision to Diff 161733.Aug 21 2018, 8:32 AM
xbolva00 updated this revision to Diff 161735.Aug 21 2018, 8:36 AM
  • Fixed tests
xbolva00 updated this revision to Diff 161736.Aug 21 2018, 8:38 AM

One more fix

I think we're almost done! Just a few nits inline.

lib/Transforms/InstCombine/InstCombineSelect.cpp
95 ↗(On Diff #161736)

indent

test/Transforms/InstCombine/select-binop-cmp.ll
193 ↗(On Diff #161736)

This is identical to select_fadd_fcmp_2? Did you mean to check OEQ instead?

583–584 ↗(On Diff #161736)

The 'bad' tests provide good coverage for all of the ways this transform should be limited, but FP is complicated, and it's very difficult to know exactly which condition in the test causes us to bypass the transform - and that's with this patch fresh in memory...we probably won't remember any of this in a month. :)

Please add a short comment above each test to explain why we can't do the fold for that test.

xbolva00 updated this revision to Diff 162000.Aug 22 2018, 10:51 AM
  • Addressed review comments
spatel accepted this revision.Aug 22 2018, 1:30 PM

LGTM

This revision is now accepted and ready to land.Aug 22 2018, 1:30 PM
This revision was automatically updated to reflect the committed changes.

LGTM

Thanks for your review and patience!


Now I am interested in https://bugs.llvm.org/show_bug.cgi?id=38479, maybe you can give me some hints where I should start do that missed optimization?