This is an archive of the discontinued LLVM Phabricator instance.

transform fmin/fmax calls when possible (PR24314)
ClosedPublic

Authored by spatel on Aug 8 2015, 10:38 AM.

Details

Summary

If we can ignore NaNs, fmin/fmax libcalls can become compare and select (this is what we turn std::min / std::max into).

This IR should then be optimized in the backend to whatever is best for any given target. Eg, x86 can use minss/maxss instructions.

This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 31583.Aug 8 2015, 10:38 AM
spatel retitled this revision from to transform fmin/fmax calls when possible (PR24314).
spatel updated this object.
spatel added reviewers: hfinkel, arsenm, jmolloy.
spatel added a subscriber: llvm-commits.
jmolloy edited edge metadata.Aug 8 2015, 1:26 PM

Hi Sanjay,

Generally this looks fine, but you should be adding the nnan and nsz attributes to the fcmp instruction via the IRBuilder for value tracking to pick up the relaxedness.

This will be important when my latest patches finally land and x86 uses the new ISD::FMINNAN nodes (which represent the semantics of minss precisely).

Cheers,

James

spatel added a comment.Aug 8 2015, 2:04 PM

Generally this looks fine, but you should be adding the nnan and nsz attributes to the fcmp instruction via the IRBuilder for value tracking to pick up the relaxedness.

Thanks, James! I wasn't paying close enough attention - didn't realize fcmp now had FMF. Certainly, I'll get this fixed up.

Is it the C definition of fmax / fmin that lets us add the nsz flag? Ie, no external relaxation flags are needed because:
"Ideally, fmax would be sensitive to the sign of zero, for example fmax(−0. 0, +0. 0) would return +0; however, implementation in software might be impractical."

spatel updated this revision to Diff 31622.Aug 9 2015, 2:41 PM
spatel edited edge metadata.

Patch updated:

  1. Add fast-math-flags to the fcmp based on function attributes.
  2. Verify the fast-math-flags on the fcmp in the test cases.
hfinkel added inline comments.Aug 12 2015, 2:57 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1233 ↗(On Diff #31622)

You still need to check TLI->has(LibFunc::*) for the function itself. Maybe you want to make a hasBinaryFloatFn (like the existing hasUnaryFloatFn)?

spatel added inline comments.Aug 12 2015, 1:15 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1233 ↗(On Diff #31622)

Sorry, I'm not understanding. We're replacing the call with regular instructions if we get this far, so there's no new library function to check. We check that the original code has this LibFunc before entering optimizeFMinMax(). Do we want to assert that or is there something else to check?

hfinkel added inline comments.Aug 12 2015, 2:09 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1221 ↗(On Diff #31622)

Actually, why do we need no NaNs? We don't support FP exceptions, so we only need to do the correct thing with NaN arguments (by returning the non-NaN). This should be easy to guarantee by picking the right ordered vs. unordered fcmp predicate.

1223 ↗(On Diff #31622)

What in the definition implies this?

1233 ↗(On Diff #31622)

Ah, you're right: That is already checked in optimizeCall. I retract my comment ;)

spatel added inline comments.Aug 12 2015, 3:17 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1221 ↗(On Diff #31622)

An unordered compare would let us know that at least one operand is NaN, but not which one. So we'd have to check each operand for NaN. We'd be rewriting an fmax() implementation in IR?

1223 ↗(On Diff #31622)

The C standard is silent about signed zeros for these, but says this in a footnote:
"Ideally, fmax would be sensitive to the sign of zero, for example fmax(−0. 0, +0. 0) would return +0; however, implementation in software might be impractical."

Should we add that here in the comment?

"rewriting an fmax() implementation in IR"

For reference, I think that would look like this:

define double @fmax(double %x, double %y) {
entry:
  %cmp0 = fcmp uno double %x, %x
  br i1 %cmp0, label %return, label %if.1

if.1:
  %cmp1 = fcmp uno double %y, %y
  br i1 %cmp1, label %return, label %if.2

if.2:
  %cmp2 = fcmp ogt double %x, %y
  %max = select i1 %cmp2, double %x, double %y
  br label %return

return:
  %retval = phi double [ %y, %entry ], [ %x, %if.1 ], [ %max, %if.2 ]
  ret double %retval
}

This is based on the C reference code from:
http://www.opensource.apple.com/source/Libm/Libm-315/Source/PowerPC/minmaxdim.c

This will be important when my latest patches finally land and x86 uses the new ISD::FMINNAN nodes (which represent the semantics of minss precisely).

It doesn't affect this patch, but unfortunately, I don't think x86 can use the new DAG nodes. The x86 min/max insts don't conform to either DAG node definition. The hardware instructions may or may not return a NaN operand. From the Intel manual:

MAX(SRC1, SRC2)
{
   IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST <- SRC2;
   ELSE IF (SRC1 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF SRC2 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF (SRC1 > SRC2) THEN DEST <- SRC1; 
   ELSE DEST <- SRC2;
   FI;
 }

The precise behavior of these instructions is tested quite thoroughly in:
test/CodeGen/X86/sse-minmax.ll

hfinkel accepted this revision.Aug 15 2015, 12:36 PM
hfinkel edited edge metadata.

LGTM (please add the comment about the signed zeros)

This will be important when my latest patches finally land and x86 uses the new ISD::FMINNAN nodes (which represent the semantics of minss precisely).

It doesn't affect this patch, but unfortunately, I don't think x86 can use the new DAG nodes. The x86 min/max insts don't conform to either DAG node definition. The hardware instructions may or may not return a NaN operand. From the Intel manual:

MAX(SRC1, SRC2)
{
   IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST <- SRC2;
   ELSE IF (SRC1 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF SRC2 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF (SRC1 > SRC2) THEN DEST <- SRC1; 
   ELSE DEST <- SRC2;
   FI;
 }

The precise behavior of these instructions is tested quite thoroughly in:
test/CodeGen/X86/sse-minmax.ll

Alright, so if either operand is NaN, then it will return the second operand. Especially considering that the current SDAG nodes are all tagged as being commutative, we certainly can't match that, at least not with a single instruction (if we're not currently operating in nnans mode).

lib/Transforms/Utils/SimplifyLibCalls.cpp
1221 ↗(On Diff #31622)

Good point. Doing so may be a good idea, but we can deal with that later. We'd obviously need to do separate benchmarking.

1223 ↗(On Diff #31622)

Yes.

This revision is now accepted and ready to land.Aug 15 2015, 12:36 PM
spatel marked an inline comment as done.Aug 16 2015, 1:15 PM
This revision was automatically updated to reflect the committed changes.