This is an archive of the discontinued LLVM Phabricator instance.

Handle sqrt() shrinking in SimplifyLibCalls like any other call
ClosedPublic

Authored by spatel on Oct 22 2014, 2:06 PM.

Download Raw Diff

Details

Reviewers

beanz
hfinkel

Commits

rG848309da7c18: Handle sqrt() shrinking in SimplifyLibCalls like any other call
rL220514: Handle sqrt() shrinking in SimplifyLibCalls like any other call

Summary

This patch removes a chunk of special case logic for folding sqrt() -> sqrtf() in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.

No functional change intended, but I loosened the restriction on the existing sqrt testcases to allow for this optimization even without unsafe-fp-math because that's the existing behavior.

I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic in case the result is used as a double.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 15278.Oct 22 2014, 2:06 PM

spatel retitled this revision from to Handle sqrt() shrinking in SimplifyLibCalls like any other call.

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: hfinkel, beanz.

spatel added a subscriber: Unknown Object (MLST).

No functional change intended, but I loosened the restriction on the existing sqrt testcases to allow for this optimization even without unsafe-fp-math because that's the existing behavior.

Please don't loosen this for the function call (although leave it as-is for the intrinsic). The intrinsic is different here, especially because it is defined to have different semantics than the function call, but we should not change the function call itself without fast-math enabled. The underlying issue is that (float) sqrt(x) != sqrtf(x) in general because of rounding issues (and on some systems, the sqrtf is not exactly 1ulp accurate), and we should not alter that without fast-math enabled.

In D5919#4, @hfinkel wrote:

Please don't loosen this for the function call (although leave it as-is for the intrinsic). The intrinsic is different here, especially because it is defined to have different semantics than the function call, but we should not change the function call itself without fast-math enabled. The underlying issue is that (float) sqrt(x) != sqrtf(x) in general because of rounding issues (and on some systems, the sqrtf is not exactly 1ulp accurate), and we should not alter that without fast-math enabled.

Let me make sure we're on the same page. The existing behavior is to transform this function call:

float x;
float y = (float) sqrt ((double) x)

or in IR:

%conv = fpext float %f to double
%call = call double @sqrt(double %conv)
%conv1 = fptrunc double %call to float
ret float %conv1

into:

float y = sqrtf (x);

There's are 2 existing positive test cases in test/Transform/InstCombine/sqrt.ll that check for this pattern and 1 negative test case to make sure we don't optimize in the event the result is used as a double.

Are you saying those existing test cases are invalid? I think that the sqrt function call test that I'm proposing to modify in double-float-shrink-1.ll would actually become redundant with the second test case in sqrt.ll; that was added for:
http://llvm.org/bugs/show_bug.cgi?id=8096

hfinkel accepted this revision.Oct 23 2014, 12:59 PM

hfinkel edited edge metadata.

This revision is now accepted and ready to land.Oct 23 2014, 12:59 PM

Closed by commit rL220514 (authored by @spatel).

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

36 lines

Utils/

SimplifyLibCalls.cpp

19 lines

test/

Transforms/

InstCombine/

double-float-shrink-1.ll

25 lines

Diff 15352

llvm/trunk/lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 1,311 Lines • ▼ Show 20 Lines	switch (II->getIntrinsicID()) {
II->getIntrinsicID(), IntrinsicType);		II->getIntrinsicID(), IntrinsicType);

Value *Args[] = { InnerTrunc };		Value *Args[] = { InnerTrunc };
return CallInst::Create(Overload, Args, II->getName());		return CallInst::Create(Overload, Args, II->getName());
}		}
}		}
}		}

// Fold (fptrunc (sqrt (fpext x))) -> (sqrtf x)
// Note that we restrict this transformation based on
// TLI->has(LibFunc::sqrtf), even for the sqrt intrinsic, because
// TLI->has(LibFunc::sqrtf) is sufficient to guarantee that the
// single-precision intrinsic can be expanded in the backend.
CallInst *Call = dyn_cast<CallInst>(CI.getOperand(0));
if (Call && Call->getCalledFunction() && TLI->has(LibFunc::sqrtf) &&
(Call->getCalledFunction()->getName() == TLI->getName(LibFunc::sqrt) \|\|
Call->getCalledFunction()->getIntrinsicID() == Intrinsic::sqrt) &&
Call->getNumArgOperands() == 1 &&
Call->hasOneUse()) {
CastInst *Arg = dyn_cast<CastInst>(Call->getArgOperand(0));
if (Arg && Arg->getOpcode() == Instruction::FPExt &&
CI.getType()->isFloatTy() &&
Call->getType()->isDoubleTy() &&
Arg->getType()->isDoubleTy() &&
Arg->getOperand(0)->getType()->isFloatTy()) {
Function *Callee = Call->getCalledFunction();
Module *M = CI.getParent()->getParent()->getParent();
Constant *SqrtfFunc = (Callee->getIntrinsicID() == Intrinsic::sqrt) ?
Intrinsic::getDeclaration(M, Intrinsic::sqrt, Builder->getFloatTy()) :
M->getOrInsertFunction("sqrtf", Callee->getAttributes(),
Builder->getFloatTy(), Builder->getFloatTy(),
NULL);
CallInst *ret = CallInst::Create(SqrtfFunc, Arg->getOperand(0),
"sqrtfcall");
ret->setAttributes(Callee->getAttributes());


// Remove the old Call. With -fmath-errno, it won't get marked readnone.
ReplaceInstUsesWith(*Call, UndefValue::get(Call->getType()));
EraseInstFromFunction(*Call);
return ret;
}
}

return nullptr;		return nullptr;
}		}

Instruction *InstCombiner::visitFPExt(CastInst &CI) {		Instruction *InstCombiner::visitFPExt(CastInst &CI) {
return commonCastTransforms(CI);		return commonCastTransforms(CI);
}		}

Instruction *InstCombiner::visitFPToUI(FPToUIInst &FI) {		Instruction *InstCombiner::visitFPToUI(FPToUIInst &FI) {
▲ Show 20 Lines • Show All 574 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,052 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizeUnaryDoubleFP(CallInst CI, IRBuilder<> &B,

// If this is something like 'floor((double)floatval)', convert to floorf.		// If this is something like 'floor((double)floatval)', convert to floorf.
FPExtInst *Cast = dyn_cast<FPExtInst>(CI->getArgOperand(0));		FPExtInst *Cast = dyn_cast<FPExtInst>(CI->getArgOperand(0));
if (!Cast \|\| !Cast->getOperand(0)->getType()->isFloatTy())		if (!Cast \|\| !Cast->getOperand(0)->getType()->isFloatTy())
return nullptr;		return nullptr;

// floor((double)floatval) -> (double)floorf(floatval)		// floor((double)floatval) -> (double)floorf(floatval)
Value *V = Cast->getOperand(0);		Value *V = Cast->getOperand(0);
		if (Callee->isIntrinsic()) {
		Module *M = CI->getParent()->getParent()->getParent();
		Intrinsic::ID IID = (Intrinsic::ID) Callee->getIntrinsicID();
		Function *F = Intrinsic::getDeclaration(M, IID, B.getFloatTy());
		V = B.CreateCall(F, V);
		} else {
		// The call is a library call rather than an intrinsic.
V = EmitUnaryFloatFnCall(V, Callee->getName(), B, Callee->getAttributes());		V = EmitUnaryFloatFnCall(V, Callee->getName(), B, Callee->getAttributes());
		}

return B.CreateFPExt(V, B.getDoubleTy());		return B.CreateFPExt(V, B.getDoubleTy());
}		}

// Double -> Float Shrinking Optimizations for Binary Functions like 'fmin/fmax'		// Double -> Float Shrinking Optimizations for Binary Functions like 'fmin/fmax'
Value LibCallSimplifier::optimizeBinaryDoubleFP(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeBinaryDoubleFP(CallInst CI, IRBuilder<> &B) {
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();
FunctionType *FT = Callee->getFunctionType();		FunctionType *FT = Callee->getFunctionType();
// Just make sure this has 2 arguments of the same FP type, which match the		// Just make sure this has 2 arguments of the same FP type, which match the
Show All 11 Lines	if (!Cast1 \|\| !Cast1->getOperand(0)->getType()->isFloatTy() \|\| !Cast2 \|\|
!Cast2->getOperand(0)->getType()->isFloatTy())		!Cast2->getOperand(0)->getType()->isFloatTy())
return nullptr;		return nullptr;

// fmin((double)floatval1, (double)floatval2)		// fmin((double)floatval1, (double)floatval2)
// -> (double)fmin(floatval1, floatval2)		// -> (double)fmin(floatval1, floatval2)
Value *V = nullptr;		Value *V = nullptr;
Value *V1 = Cast1->getOperand(0);		Value *V1 = Cast1->getOperand(0);
Value *V2 = Cast2->getOperand(0);		Value *V2 = Cast2->getOperand(0);
		// TODO: Handle intrinsics in the same way as in optimizeUnaryDoubleFP().
V = EmitBinaryFloatFnCall(V1, V2, Callee->getName(), B,		V = EmitBinaryFloatFnCall(V1, V2, Callee->getName(), B,
Callee->getAttributes());		Callee->getAttributes());
return B.CreateFPExt(V, B.getDoubleTy());		return B.CreateFPExt(V, B.getDoubleTy());
}		}

Value LibCallSimplifier::optimizeCos(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeCos(CallInst CI, IRBuilder<> &B) {
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();
Value *Ret = nullptr;		Value *Ret = nullptr;
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizeFabs(CallInst CI, IRBuilder<> &B) {
}		}
return Ret;		return Ret;
}		}

Value LibCallSimplifier::optimizeSqrt(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeSqrt(CallInst CI, IRBuilder<> &B) {
Function *Callee = CI->getCalledFunction();		Function *Callee = CI->getCalledFunction();

Value *Ret = nullptr;		Value *Ret = nullptr;
if (UnsafeFPShrink && Callee->getName() == "sqrt" &&		if (TLI->has(LibFunc::sqrtf) && (Callee->getName() == "sqrt" \|\|
TLI->has(LibFunc::sqrtf)) {		Callee->getIntrinsicID() == Intrinsic::sqrt))
Ret = optimizeUnaryDoubleFP(CI, B, true);		Ret = optimizeUnaryDoubleFP(CI, B, true);
}

// FIXME: For finer-grain optimization, we need intrinsics to have the same		// FIXME: For finer-grain optimization, we need intrinsics to have the same
// fast-math flag decorations that are applied to FP instructions. For now,		// fast-math flag decorations that are applied to FP instructions. For now,
// we have to rely on the function-level unsafe-fp-math attribute to do this		// we have to rely on the function-level unsafe-fp-math attribute to do this
// optimization because there's no other way to express that the sqrt can be		// optimization because there's no other way to express that the sqrt can be
// reassociated.		// reassociated.
Function *F = CI->getParent()->getParent();		Function *F = CI->getParent()->getParent();
if (F->hasFnAttribute("unsafe-fp-math")) {		if (F->hasFnAttribute("unsafe-fp-math")) {
▲ Show 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	else if (Callee->hasFnAttribute("unsafe-fp-math")) {
// function attribute.		// function attribute.

// Check for unsafe-fp-math = true.		// Check for unsafe-fp-math = true.
Attribute Attr = Callee->getFnAttribute("unsafe-fp-math");		Attribute Attr = Callee->getFnAttribute("unsafe-fp-math");
if (Attr.getValueAsString() == "true")		if (Attr.getValueAsString() == "true")
UnsafeFPShrink = true;		UnsafeFPShrink = true;
}		}

// Next check for intrinsics.		// First, check for intrinsics.
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI)) {
if (!isCallingConvC)		if (!isCallingConvC)
return nullptr;		return nullptr;
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::pow:		case Intrinsic::pow:
return optimizePow(CI, Builder);		return optimizePow(CI, Builder);
case Intrinsic::exp2:		case Intrinsic::exp2:
return optimizeExp2(CI, Builder);		return optimizeExp2(CI, Builder);
▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/double-float-shrink-1.ll

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	define float @sqrt_test(float %f) {
%conv = fpext float %f to double		%conv = fpext float %f to double
%call = call double @sqrt(double %conv)		%call = call double @sqrt(double %conv)
%conv1 = fptrunc double %call to float		%conv1 = fptrunc double %call to float
ret float %conv1		ret float %conv1
; CHECK-LABEL: sqrt_test		; CHECK-LABEL: sqrt_test
; CHECK: call float @sqrtf(float %f)		; CHECK: call float @sqrtf(float %f)
}		}

		define double @sqrt_test2(float %f) {
		%conv = fpext float %f to double
		%call = call double @sqrt(double %conv)
		ret double %call
		; CHECK-LABEL: sqrt_test2
		; CHECK: call double @sqrt(double %conv)
		}

define float @sqrt_int_test(float %f) {		define float @sqrt_int_test(float %f) {
%conv = fpext float %f to double		%conv = fpext float %f to double
%call = call double @llvm.sqrt.f64(double %conv)		%call = call double @llvm.sqrt.f64(double %conv)
%conv1 = fptrunc double %call to float		%conv1 = fptrunc double %call to float
ret float %conv1		ret float %conv1
; CHECK-LABEL: sqrt_int_test		; CHECK-LABEL: sqrt_int_test
; CHECK: call float @llvm.sqrt.f32(float %f)		; CHECK: call float @llvm.sqrt.f32(float %f)
}		}

define double @sqrt_test2(float %f) {		define double @sqrt_int_test2(float %f) {
%conv = fpext float %f to double		%conv = fpext float %f to double
%call = call double @sqrt(double %conv)		%call = call double @llvm.sqrt.f64(double %conv)
ret double %call		ret double %call
; CHECK-LABEL: sqrt_test2		; CHECK-LABEL: sqrt_int_test2
; CHECK: call double @sqrt(double %conv)		; CHECK: call double @llvm.sqrt.f64(double %conv)
}		}

define float @tan_test(float %f) {		define float @tan_test(float %f) {
%conv = fpext float %f to double		%conv = fpext float %f to double
%call = call double @tan(double %conv)		%call = call double @tan(double %conv)
%conv1 = fptrunc double %call to float		%conv1 = fptrunc double %call to float
ret float %conv1		ret float %conv1
; CHECK-LABEL: tan_test		; CHECK-LABEL: tan_test
; CHECK: call float @tanf(float %f)		; CHECK: call float @tanf(float %f)
}		}
Show All 19 Lines	define double @tanh_test2(float %f) {
%call = call double @tanh(double %conv)		%call = call double @tanh(double %conv)
ret double %call		ret double %call
; CHECK-LABEL: tanh_test2		; CHECK-LABEL: tanh_test2
; CHECK: call double @tanh(double %conv)		; CHECK: call double @tanh(double %conv)
}		}

declare double @tanh(double) #1		declare double @tanh(double) #1
declare double @tan(double) #1		declare double @tan(double) #1
declare double @sqrt(double) #1
		; sqrt is a special case: the shrinking optimization
		; is valid even without unsafe-fp-math.
		declare double @sqrt(double)
		declare double @llvm.sqrt.f64(double)

declare double @sin(double) #1		declare double @sin(double) #1
declare double @log2(double) #1		declare double @log2(double) #1
declare double @log1p(double) #1		declare double @log1p(double) #1
declare double @log10(double) #1		declare double @log10(double) #1
declare double @log(double) #1		declare double @log(double) #1
declare double @logb(double) #1		declare double @logb(double) #1
declare double @exp10(double) #1		declare double @exp10(double) #1
declare double @expm1(double) #1		declare double @expm1(double) #1
declare double @exp(double) #1		declare double @exp(double) #1
declare double @cbrt(double) #1		declare double @cbrt(double) #1
declare double @atanh(double) #1		declare double @atanh(double) #1
declare double @atan(double) #1		declare double @atan(double) #1
declare double @acos(double) #1		declare double @acos(double) #1
declare double @acosh(double) #1		declare double @acosh(double) #1
declare double @asin(double) #1		declare double @asin(double) #1
declare double @asinh(double) #1		declare double @asinh(double) #1

declare double @llvm.sqrt.f64(double) #1
attributes #1 = { "unsafe-fp-math"="true" }		attributes #1 = { "unsafe-fp-math"="true" }