This is an archive of the discontinued LLVM Phabricator instance.

transform fmin/fmax calls when possible (PR24314)
ClosedPublic

Authored by spatel on Aug 8 2015, 10:38 AM.

Download Raw Diff

Details

Reviewers

jmolloy
arsenm
hfinkel

Commits

rG57fd1dc5db88: transform fmin/fmax calls when possible (PR24314)
rL245187: transform fmin/fmax calls when possible (PR24314)

Summary

If we can ignore NaNs, fmin/fmax libcalls can become compare and select (this is what we turn std::min / std::max into).

This IR should then be optimized in the backend to whatever is best for any given target. Eg, x86 can use minss/maxss instructions.

This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 31583.Aug 8 2015, 10:38 AM

spatel retitled this revision from to transform fmin/fmax calls when possible (PR24314).

spatel updated this object.

spatel added reviewers: hfinkel, arsenm, jmolloy.

spatel added a subscriber: llvm-commits.

Hi Sanjay,

Generally this looks fine, but you should be adding the nnan and nsz attributes to the fcmp instruction via the IRBuilder for value tracking to pick up the relaxedness.

This will be important when my latest patches finally land and x86 uses the new ISD::FMINNAN nodes (which represent the semantics of minss precisely).

Cheers,

James

In D11866#219873, @jmolloy wrote:

Generally this looks fine, but you should be adding the nnan and nsz attributes to the fcmp instruction via the IRBuilder for value tracking to pick up the relaxedness.

Thanks, James! I wasn't paying close enough attention - didn't realize fcmp now had FMF. Certainly, I'll get this fixed up.

Is it the C definition of fmax / fmin that lets us add the nsz flag? Ie, no external relaxation flags are needed because:
"Ideally, fmax would be sensitive to the sign of zero, for example fmax(−0. 0, +0. 0) would return +0; however, implementation in software might be impractical."

Patch updated:

Add fast-math-flags to the fcmp based on function attributes.
Verify the fast-math-flags on the fcmp in the test cases.

hfinkel added inline comments.Aug 12 2015, 2:57 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1233 ↗	(On Diff #31622)	You still need to check TLI->has(LibFunc::*) for the function itself. Maybe you want to make a hasBinaryFloatFn (like the existing hasUnaryFloatFn)?

spatel added inline comments.Aug 12 2015, 1:15 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1233 ↗	(On Diff #31622)	Sorry, I'm not understanding. We're replacing the call with regular instructions if we get this far, so there's no new library function to check. We check that the original code has this LibFunc before entering optimizeFMinMax(). Do we want to assert that or is there something else to check?

hfinkel added inline comments.Aug 12 2015, 2:09 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1221 ↗	(On Diff #31622)	Actually, why do we need no NaNs? We don't support FP exceptions, so we only need to do the correct thing with NaN arguments (by returning the non-NaN). This should be easy to guarantee by picking the right ordered vs. unordered fcmp predicate.
1223 ↗	(On Diff #31622)	What in the definition implies this?
1233 ↗	(On Diff #31622)	Ah, you're right: That is already checked in optimizeCall. I retract my comment ;)

spatel added inline comments.Aug 12 2015, 3:17 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1221 ↗	(On Diff #31622)	An unordered compare would let us know that at least one operand is NaN, but not which one. So we'd have to check each operand for NaN. We'd be rewriting an fmax() implementation in IR?
1223 ↗	(On Diff #31622)	The C standard is silent about signed zeros for these, but says this in a footnote: "Ideally, fmax would be sensitive to the sign of zero, for example fmax(−0. 0, +0. 0) would return +0; however, implementation in software might be impractical." Should we add that here in the comment?

"rewriting an fmax() implementation in IR"

For reference, I think that would look like this:

define double @fmax(double %x, double %y) {
entry:
  %cmp0 = fcmp uno double %x, %x
  br i1 %cmp0, label %return, label %if.1

if.1:
  %cmp1 = fcmp uno double %y, %y
  br i1 %cmp1, label %return, label %if.2

if.2:
  %cmp2 = fcmp ogt double %x, %y
  %max = select i1 %cmp2, double %x, double %y
  br label %return

return:
  %retval = phi double [ %y, %entry ], [ %x, %if.1 ], [ %max, %if.2 ]
  ret double %retval
}

This is based on the C reference code from:
http://www.opensource.apple.com/source/Libm/Libm-315/Source/PowerPC/minmaxdim.c

In D11866#219873, @jmolloy wrote:

This will be important when my latest patches finally land and x86 uses the new ISD::FMINNAN nodes (which represent the semantics of minss precisely).

It doesn't affect this patch, but unfortunately, I don't think x86 can use the new DAG nodes. The x86 min/max insts don't conform to either DAG node definition. The hardware instructions may or may not return a NaN operand. From the Intel manual:

MAX(SRC1, SRC2)
{
   IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST <- SRC2;
   ELSE IF (SRC1 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF SRC2 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF (SRC1 > SRC2) THEN DEST <- SRC1; 
   ELSE DEST <- SRC2;
   FI;
 }

The precise behavior of these instructions is tested quite thoroughly in:
test/CodeGen/X86/sse-minmax.ll

LGTM (please add the comment about the signed zeros)

In D11866#225065, @spatel wrote:
In D11866#219873, @jmolloy wrote:

This will be important when my latest patches finally land and x86 uses the new ISD::FMINNAN nodes (which represent the semantics of minss precisely).

It doesn't affect this patch, but unfortunately, I don't think x86 can use the new DAG nodes. The x86 min/max insts don't conform to either DAG node definition. The hardware instructions may or may not return a NaN operand. From the Intel manual:
MAX(SRC1, SRC2)
{
   IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST <- SRC2;
   ELSE IF (SRC1 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF SRC2 = SNaN) THEN DEST <- SRC2; FI; 
   ELSE IF (SRC1 > SRC2) THEN DEST <- SRC1; 
   ELSE DEST <- SRC2;
   FI;
 }
The precise behavior of these instructions is tested quite thoroughly in:
test/CodeGen/X86/sse-minmax.ll

Alright, so if either operand is NaN, then it will return the second operand. Especially considering that the current SDAG nodes are all tagged as being commutative, we certainly can't match that, at least not with a single instruction (if we're not currently operating in nnans mode).

lib/Transforms/Utils/SimplifyLibCalls.cpp
1221 ↗	(On Diff #31622)	Good point. Doing so may be a good idea, but we can deal with that later. We'd obviously need to do separate benchmarking.
1223 ↗	(On Diff #31622)	Yes.

This revision is now accepted and ready to land.Aug 15 2015, 12:36 PM

Closed by commit rL245187: transform fmin/fmax calls when possible (PR24314) (authored by spatel). · Explain WhyAug 16 2015, 1:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

SimplifyLibCalls.h

1 line

lib/

Transforms/

Utils/

SimplifyLibCalls.cpp

63 lines

test/

Transforms/

InstCombine/

fast-math.ll

107 lines

Diff 32250

llvm/trunk/include/llvm/Transforms/Utils/SimplifyLibCalls.h

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	private:

// Math Library Optimizations		// Math Library Optimizations
Value optimizeUnaryDoubleFP(CallInst CI, IRBuilder<> &B, bool CheckRetType);		Value optimizeUnaryDoubleFP(CallInst CI, IRBuilder<> &B, bool CheckRetType);
Value optimizeBinaryDoubleFP(CallInst CI, IRBuilder<> &B);		Value optimizeBinaryDoubleFP(CallInst CI, IRBuilder<> &B);
Value optimizeCos(CallInst CI, IRBuilder<> &B);		Value optimizeCos(CallInst CI, IRBuilder<> &B);
Value optimizePow(CallInst CI, IRBuilder<> &B);		Value optimizePow(CallInst CI, IRBuilder<> &B);
Value optimizeExp2(CallInst CI, IRBuilder<> &B);		Value optimizeExp2(CallInst CI, IRBuilder<> &B);
Value optimizeFabs(CallInst CI, IRBuilder<> &B);		Value optimizeFabs(CallInst CI, IRBuilder<> &B);
		Value optimizeFMinFMax(CallInst CI, IRBuilder<> &B);
Value optimizeSqrt(CallInst CI, IRBuilder<> &B);		Value optimizeSqrt(CallInst CI, IRBuilder<> &B);
Value optimizeSinCosPi(CallInst CI, IRBuilder<> &B);		Value optimizeSinCosPi(CallInst CI, IRBuilder<> &B);

// Integer Library Call Optimizations		// Integer Library Call Optimizations
Value optimizeFFS(CallInst CI, IRBuilder<> &B);		Value optimizeFFS(CallInst CI, IRBuilder<> &B);
Value optimizeAbs(CallInst CI, IRBuilder<> &B);		Value optimizeAbs(CallInst CI, IRBuilder<> &B);
Value optimizeIsDigit(CallInst CI, IRBuilder<> &B);		Value optimizeIsDigit(CallInst CI, IRBuilder<> &B);
Value optimizeIsAscii(CallInst CI, IRBuilder<> &B);		Value optimizeIsAscii(CallInst CI, IRBuilder<> &B);
Show All 30 Lines

llvm/trunk/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,178 Lines • ▼ Show 20 Lines	if (Instruction *I = dyn_cast<Instruction>(Op)) {
// Fold fabs(x * x) -> x * x; any squared FP value must already be positive.		// Fold fabs(x * x) -> x * x; any squared FP value must already be positive.
if (I->getOpcode() == Instruction::FMul)		if (I->getOpcode() == Instruction::FMul)
if (I->getOperand(0) == I->getOperand(1))		if (I->getOperand(0) == I->getOperand(1))
return Op;		return Op;
}		}
return Ret;		return Ret;
}		}

		Value LibCallSimplifier::optimizeFMinFMax(CallInst CI, IRBuilder<> &B) {
		// If we can shrink the call to a float function rather than a double
		// function, do that first.
		Function *Callee = CI->getCalledFunction();
		if ((Callee->getName() == "fmin" && TLI->has(LibFunc::fminf)) \|\|
		(Callee->getName() == "fmax" && TLI->has(LibFunc::fmaxf))) {
		Value *Ret = optimizeBinaryDoubleFP(CI, B);
		if (Ret)
		return Ret;
		}

		// Make sure this has 2 arguments of FP type which match the result type.
		FunctionType *FT = Callee->getFunctionType();
		if (FT->getNumParams() != 2 \|\| FT->getReturnType() != FT->getParamType(0) \|\|
		FT->getParamType(0) != FT->getParamType(1) \|\|
		!FT->getParamType(0)->isFloatingPointTy())
		return nullptr;

		// FIXME: For finer-grain optimization, we need intrinsics to have the same
		// fast-math flag decorations that are applied to FP instructions. For now,
		// we have to rely on the function-level attributes to do this optimization
		// because there's no other way to express that the calls can be relaxed.
		IRBuilder<true, ConstantFolder,
		IRBuilderDefaultInserter<true> >::FastMathFlagGuard Guard(B);
		FastMathFlags FMF;
		Function *F = CI->getParent()->getParent();
		Attribute Attr = F->getFnAttribute("unsafe-fp-math");
		if (Attr.getValueAsString() == "true") {
		// Unsafe algebra sets all fast-math-flags to true.
		FMF.setUnsafeAlgebra();
		} else {
		// At a minimum, no-nans-fp-math must be true.
		Attr = F->getFnAttribute("no-nans-fp-math");
		if (Attr.getValueAsString() != "true")
		return nullptr;
		// No-signed-zeros is implied by the definitions of fmax/fmin themselves:
		// "Ideally, fmax would be sensitive to the sign of zero, for example
		// fmax(−0. 0, +0. 0) would return +0; however, implementation in software
		// might be impractical."
		FMF.setNoSignedZeros();
		FMF.setNoNaNs();
		}
		B.SetFastMathFlags(FMF);

		// We have a relaxed floating-point environment. We can ignore NaN-handling
		// and transform to a compare and select. We do not have to consider errno or
		// exceptions, because fmin/fmax do not have those.
		Value *Op0 = CI->getArgOperand(0);
		Value *Op1 = CI->getArgOperand(1);
		Value *Cmp = Callee->getName().startswith("fmin") ?
		B.CreateFCmpOLT(Op0, Op1) : B.CreateFCmpOGT(Op0, Op1);
		return B.CreateSelect(Cmp, Op0, Op1);
		}

Value LibCallSimplifier::optimizeSqrt(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeSqrt(CallInst CI, IRBuilder<> &B) {
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();

Value *Ret = nullptr;		Value *Ret = nullptr;
if (TLI->has(LibFunc::sqrtf) && (Callee->getName() == "sqrt" \|\|		if (TLI->has(LibFunc::sqrtf) && (Callee->getName() == "sqrt" \|\|
Callee->getIntrinsicID() == Intrinsic::sqrt))		Callee->getIntrinsicID() == Intrinsic::sqrt))
Ret = optimizeUnaryDoubleFP(CI, B, true);		Ret = optimizeUnaryDoubleFP(CI, B, true);

▲ Show 20 Lines • Show All 910 Lines • ▼ Show 20 Lines	if (TLI->getLibFunc(FuncName, Func) && TLI->has(Func)) {
case LibFunc::sin:		case LibFunc::sin:
case LibFunc::sinh:		case LibFunc::sinh:
case LibFunc::tan:		case LibFunc::tan:
case LibFunc::tanh:		case LibFunc::tanh:
if (UnsafeFPShrink && hasFloatVersion(FuncName))		if (UnsafeFPShrink && hasFloatVersion(FuncName))
return optimizeUnaryDoubleFP(CI, Builder, true);		return optimizeUnaryDoubleFP(CI, Builder, true);
return nullptr;		return nullptr;
case LibFunc::copysign:		case LibFunc::copysign:
case LibFunc::fmin:
case LibFunc::fmax:
if (hasFloatVersion(FuncName))		if (hasFloatVersion(FuncName))
return optimizeBinaryDoubleFP(CI, Builder);		return optimizeBinaryDoubleFP(CI, Builder);
return nullptr;		return nullptr;
		case LibFunc::fminf:
		case LibFunc::fmin:
		case LibFunc::fminl:
		case LibFunc::fmaxf:
		case LibFunc::fmax:
		case LibFunc::fmaxl:
		return optimizeFMinFMax(CI, Builder);
default:		default:
return nullptr;		return nullptr;
}		}
}		}
return nullptr;		return nullptr;
}		}

LibCallSimplifier::LibCallSimplifier(		LibCallSimplifier::LibCallSimplifier(
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/fast-math.ll

Show First 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	define fp128 @sqrt_call_squared_f128(fp128 %x) #0 {
%sqrt = call fp128 @sqrtl(fp128 %mul)		%sqrt = call fp128 @sqrtl(fp128 %mul)
ret fp128 %sqrt		ret fp128 %sqrt

; CHECK-LABEL: sqrt_call_squared_f128(		; CHECK-LABEL: sqrt_call_squared_f128(
; CHECK-NEXT: %fabs = call fp128 @llvm.fabs.f128(fp128 %x)		; CHECK-NEXT: %fabs = call fp128 @llvm.fabs.f128(fp128 %x)
; CHECK-NEXT: ret fp128 %fabs		; CHECK-NEXT: ret fp128 %fabs
}		}

		; =========================================================================
		;
		; Test-cases for fmin / fmax
		;
		; =========================================================================

		declare double @fmax(double, double)
		declare double @fmin(double, double)
		declare float @fmaxf(float, float)
		declare float @fminf(float, float)
		declare fp128 @fmaxl(fp128, fp128)
		declare fp128 @fminl(fp128, fp128)

		; No NaNs is the minimum requirement to replace these calls.
		; This should always be set when unsafe-fp-math is true, but
		; alternate the attributes for additional test coverage.
		; 'nsz' is implied by the definition of fmax or fmin itself.
		attributes #1 = { "no-nans-fp-math" = "true" }

		; Shrink and remove the call.
		define float @max1(float %a, float %b) #0 {
		%c = fpext float %a to double
		%d = fpext float %b to double
		%e = call double @fmax(double %c, double %d)
		%f = fptrunc double %e to float
		ret float %f

		; CHECK-LABEL: max1(
		; CHECK-NEXT: fcmp fast ogt float %a, %b
		; CHECK-NEXT: select {{.*}} float %a, float %b
		; CHECK-NEXT: ret
		}

		define float @max2(float %a, float %b) #1 {
		%c = call float @fmaxf(float %a, float %b)
		ret float %c

		; CHECK-LABEL: max2(
		; CHECK-NEXT: fcmp nnan nsz ogt float %a, %b
		; CHECK-NEXT: select {{.*}} float %a, float %b
		; CHECK-NEXT: ret
		}


		define double @max3(double %a, double %b) #0 {
		%c = call double @fmax(double %a, double %b)
		ret double %c

		; CHECK-LABEL: max3(
		; CHECK-NEXT: fcmp fast ogt double %a, %b
		; CHECK-NEXT: select {{.*}} double %a, double %b
		; CHECK-NEXT: ret
		}

		define fp128 @max4(fp128 %a, fp128 %b) #1 {
		%c = call fp128 @fmaxl(fp128 %a, fp128 %b)
		ret fp128 %c

		; CHECK-LABEL: max4(
		; CHECK-NEXT: fcmp nnan nsz ogt fp128 %a, %b
		; CHECK-NEXT: select {{.*}} fp128 %a, fp128 %b
		; CHECK-NEXT: ret
		}

		; Shrink and remove the call.
		define float @min1(float %a, float %b) #1 {
		%c = fpext float %a to double
		%d = fpext float %b to double
		%e = call double @fmin(double %c, double %d)
		%f = fptrunc double %e to float
		ret float %f

		; CHECK-LABEL: min1(
		; CHECK-NEXT: fcmp nnan nsz olt float %a, %b
		; CHECK-NEXT: select {{.*}} float %a, float %b
		; CHECK-NEXT: ret
		}

		define float @min2(float %a, float %b) #0 {
		%c = call float @fminf(float %a, float %b)
		ret float %c

		; CHECK-LABEL: min2(
		; CHECK-NEXT: fcmp fast olt float %a, %b
		; CHECK-NEXT: select {{.*}} float %a, float %b
		; CHECK-NEXT: ret
		}

		define double @min3(double %a, double %b) #1 {
		%c = call double @fmin(double %a, double %b)
		ret double %c

		; CHECK-LABEL: min3(
		; CHECK-NEXT: fcmp nnan nsz olt double %a, %b
		; CHECK-NEXT: select {{.*}} double %a, double %b
		; CHECK-NEXT: ret
		}

		define fp128 @min4(fp128 %a, fp128 %b) #0 {
		%c = call fp128 @fminl(fp128 %a, fp128 %b)
		ret fp128 %c

		; CHECK-LABEL: min4(
		; CHECK-NEXT: fcmp fast olt fp128 %a, %b
		; CHECK-NEXT: select {{.*}} fp128 %a, fp128 %b
		; CHECK-NEXT: ret
		}

This is an archive of the discontinued LLVM Phabricator instance.

transform fmin/fmax calls when possible (PR24314)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 32250

llvm/trunk/include/llvm/Transforms/Utils/SimplifyLibCalls.h

llvm/trunk/lib/Transforms/Utils/SimplifyLibCalls.cpp

llvm/trunk/test/Transforms/InstCombine/fast-math.ll

transform fmin/fmax calls when possible (PR24314)
ClosedPublic