This is an archive of the discontinued LLVM Phabricator instance.

fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
ClosedPublic

Authored by spatel on Oct 14 2014, 3:31 PM.

Download Raw Diff

Details

Reviewers

beanz
hfinkel
mcrosier

Commits

rGc699a6117b0f: fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
rL219944: fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)

Summary

If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

y = sqrt(x * x);

becomes this:

y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 14898.Oct 14 2014, 3:31 PM

spatel retitled this revision from to fold: sqrt(x * x * y) -> fabs(x) * sqrt(y).

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: beanz, hfinkel, mcrosier.

spatel added a subscriber: Unknown Object (MLST).

hfinkel added inline comments.Oct 14 2014, 3:41 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗	(On Diff #14898)	Hrmm... is this right, or do we need to check the function attribute here? I'm not sure that the argument having the unsafe-algebra flag means that we can change its use in a non-strict way.

spatel added inline comments.Oct 14 2014, 5:22 PM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗	(On Diff #14898)	I think this goes back to the discussion of whether 'fast' is infectious in http://reviews.llvm.org/D5584. :) In that case, we optimized away a sqrt call even though it wasn't explicitly marked 'fast'. This is a similar transform. We've just reversed the order of the fmul and sqrt. I thought I saw some other precedence for just using the instruction-level flags, but I'm not finding it now. There's an optimization for log2() in InstCombineMulDivRem that checks whether the log2 intrinsic hasUnsafeAlgebra...I don't know how that is even specified in IR. I thought the IR fast-math flags only apply to fmul, fdiv, fadd, fsub, and frem? At least, that's what the LangRef says. FWIW, that log2 optimization doesn't appear to ever trigger, and I don't see any test case for it (r169025).

hfinkel added inline comments.Oct 15 2014, 12:22 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗	(On Diff #14898)	In that case, we optimized away a sqrt call even though it wasn't explicitly marked 'fast'. This is a similar transform. We've just reversed the order of the fmul and sqrt. Right, but in that case we were optimizing the fmul, and it was the "outer" operation, and it had fast. In this case, the sqrt is the outer operation, and we need to check its equivalent. Generally speaking, fast-math flags cannot infect users (only operands), except for some result assumptions, because of the inlining use case. FWIW, that log2 optimization doesn't appear to ever trigger, and I don't see any test case for it (r169025). Seems like this is a bug (I would not object to enhancing the IR to support fast-math flags on intrinsics, but as it stands, it is a bug).

spatel added inline comments.Oct 15 2014, 10:25 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗	(On Diff #14898)	Ah - ok, that sounds reasonable. Is the outer/inner distinction specified somewhere? In the LangRef? And yes, I think if we're committed to instruction-level IR fast-math-flags, then intrinsics deserve those decorations too. How about this for now: I'll redo this patch and testcases to check for a function-level attribute, add a TODO comment that we should revisit it if and when intrinsics get FMF support, and open a bug to add FMF support to intrinsics. We had also run into the issue of FMF on intrinsics in http://reviews.llvm.org/D5222. We weren't sure if we could just treat the function-level attribute as a convenience, but I don't think we can do that in the following case: inlining/LTO where strict code gets inlined into fast code. The strict code is the "inner" logic in that case...so even though it wouldn't have IR-level FMF, it should bend to the "outer" function's attributes and IR-level FMF?

hfinkel added inline comments.Oct 15 2014, 10:48 AM

lib/Transforms/Utils/SimplifyLibCalls.cpp
1270 ↗	(On Diff #14898)	Ah - ok, that sounds reasonable. Is the outer/inner distinction specified somewhere? In the LangRef? Looking at the LangRef, it seems very unclear on this point. We should make this more clear, because it was designed (not by me) to support a specific use case involving inlining during LTO, and we should spell that out. How about this for now: I'll redo this patch and testcases to check for a function-level attribute, add a TODO comment that we should revisit it if and when intrinsics get FMF support, and open a bug to add FMF support to intrinsics. Yes, this sounds good. We had also run into the issue of FMF on intrinsics in http://reviews.llvm.org/D5222. We weren't sure if we could just treat the function-level attribute as a convenience, but I don't think we can do that in the following case: inlining/LTO where strict code gets inlined into fast code. The strict code is the "inner" logic in that case...so even though it wouldn't have IR-level FMF, it should bend to the "outer" function's attributes and IR-level FMF? Sounds right.

Code and regression tests updated to use a function-level attribute to enable the optimization.

Note that the existing optimization to do (X*Y) * X => (X*X) * Y in InstCombineMulDivRem also needs to detect the function-level attribute, otherwise it won't reassociate the factors to the square root into the canonical form.

Adding for the record...

lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
453 ↗	(On Diff #14960)	As we discussed elsewhere (PR21291), this is incorrect.

Removed the change to InstCombineMulDivRem and the additional test case; function-level attributes do NOT override IR-level FMF.
But since we have no FMF support for intrinsics currently, added FIXMEs where we have to use the function-level attribute to make this optimization possible. The new testcases also have 'fast' at the IR level on all fmul. I think that should be a requirement for this optimization.

The intrinsic enhancement is filed as:
http://llvm.org/bugs/show_bug.cgi?id=21290

but as discussed here:
http://llvm.org/bugs/show_bug.cgi?id=21291

the IR-level FMF is not actually doing the job it was intended to do because the backend currently overrides/ignores all IR-level FMF.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 16 2014, 10:54 AM

Closed by commit rL219944 (authored by @spatel).

Thanks, Hal! Checked in with r219944.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

SimplifyLibCalls.h

1 line

lib/

Transforms/

Utils/

SimplifyLibCalls.cpp

88 lines

test/

Transforms/

InstCombine/

fast-math.ll

170 lines

Diff 15034

llvm/trunk/include/llvm/Transforms/Utils/SimplifyLibCalls.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	private:

// Math Library Optimizations		// Math Library Optimizations
Value optimizeUnaryDoubleFP(CallInst CI, IRBuilder<> &B, bool CheckRetType);		Value optimizeUnaryDoubleFP(CallInst CI, IRBuilder<> &B, bool CheckRetType);
Value optimizeBinaryDoubleFP(CallInst CI, IRBuilder<> &B);		Value optimizeBinaryDoubleFP(CallInst CI, IRBuilder<> &B);
Value optimizeCos(CallInst CI, IRBuilder<> &B);		Value optimizeCos(CallInst CI, IRBuilder<> &B);
Value optimizePow(CallInst CI, IRBuilder<> &B);		Value optimizePow(CallInst CI, IRBuilder<> &B);
Value optimizeExp2(CallInst CI, IRBuilder<> &B);		Value optimizeExp2(CallInst CI, IRBuilder<> &B);
Value optimizeFabs(CallInst CI, IRBuilder<> &B);		Value optimizeFabs(CallInst CI, IRBuilder<> &B);
		Value optimizeSqrt(CallInst CI, IRBuilder<> &B);
Value optimizeSinCosPi(CallInst CI, IRBuilder<> &B);		Value optimizeSinCosPi(CallInst CI, IRBuilder<> &B);

// Integer Library Call Optimizations		// Integer Library Call Optimizations
Value optimizeFFS(CallInst CI, IRBuilder<> &B);		Value optimizeFFS(CallInst CI, IRBuilder<> &B);
Value optimizeAbs(CallInst CI, IRBuilder<> &B);		Value optimizeAbs(CallInst CI, IRBuilder<> &B);
Value optimizeIsDigit(CallInst CI, IRBuilder<> &B);		Value optimizeIsDigit(CallInst CI, IRBuilder<> &B);
Value optimizeIsAscii(CallInst CI, IRBuilder<> &B);		Value optimizeIsAscii(CallInst CI, IRBuilder<> &B);
Value optimizeToAscii(CallInst CI, IRBuilder<> &B);		Value optimizeToAscii(CallInst CI, IRBuilder<> &B);
Show All 29 Lines

llvm/trunk/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show All 21 Lines
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetLibraryInfo.h"		#include "llvm/Target/TargetLibraryInfo.h"
#include "llvm/Transforms/Utils/BuildLibCalls.h"		#include "llvm/Transforms/Utils/BuildLibCalls.h"

using namespace llvm;		using namespace llvm;
		using namespace PatternMatch;

static cl::opt<bool>		static cl::opt<bool>
ColdErrorCalls("error-reporting-is-cold", cl::init(true), cl::Hidden,		ColdErrorCalls("error-reporting-is-cold", cl::init(true), cl::Hidden,
cl::desc("Treat error-reporting calls as cold"));		cl::desc("Treat error-reporting calls as cold"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper Functions		// Helper Functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 1,205 Lines • ▼ Show 20 Lines	if (Instruction *I = dyn_cast<Instruction>(Op)) {
// Fold fabs(x * x) -> x * x; any squared FP value must already be positive.		// Fold fabs(x * x) -> x * x; any squared FP value must already be positive.
if (I->getOpcode() == Instruction::FMul)		if (I->getOpcode() == Instruction::FMul)
if (I->getOperand(0) == I->getOperand(1))		if (I->getOperand(0) == I->getOperand(1))
return Op;		return Op;
}		}
return Ret;		return Ret;
}		}

		Value LibCallSimplifier::optimizeSqrt(CallInst CI, IRBuilder<> &B) {
		Function *Callee = CI->getCalledFunction();

		Value *Ret = nullptr;
		if (UnsafeFPShrink && Callee->getName() == "sqrt" &&
		TLI->has(LibFunc::sqrtf)) {
		Ret = optimizeUnaryDoubleFP(CI, B, true);
		}

		// FIXME: For finer-grain optimization, we need intrinsics to have the same
		// fast-math flag decorations that are applied to FP instructions. For now,
		// we have to rely on the function-level unsafe-fp-math attribute to do this
		// optimization because there's no other way to express that the sqrt can be
		// reassociated.
		Function *F = CI->getParent()->getParent();
		if (F->hasFnAttribute("unsafe-fp-math")) {
		// Check for unsafe-fp-math = true.
		Attribute Attr = F->getFnAttribute("unsafe-fp-math");
		if (Attr.getValueAsString() != "true")
		return Ret;
		}
		Value *Op = CI->getArgOperand(0);
		if (Instruction *I = dyn_cast<Instruction>(Op)) {
		if (I->getOpcode() == Instruction::FMul && I->hasUnsafeAlgebra()) {
		// We're looking for a repeated factor in a multiplication tree,
		// so we can do this fold: sqrt(x * x) -> fabs(x);
		// or this fold: sqrt(x * x * y) -> fabs(x) * sqrt(y).
		Value *Op0 = I->getOperand(0);
		Value *Op1 = I->getOperand(1);
		Value *RepeatOp = nullptr;
		Value *OtherOp = nullptr;
		if (Op0 == Op1) {
		// Simple match: the operands of the multiply are identical.
		RepeatOp = Op0;
		} else {
		// Look for a more complicated pattern: one of the operands is itself
		// a multiply, so search for a common factor in that multiply.
		// Note: We don't bother looking any deeper than this first level or for
		// variations of this pattern because instcombine's visitFMUL and/or the
		// reassociation pass should give us this form.
		Value OtherMul0, OtherMul1;
		if (match(Op0, m_FMul(m_Value(OtherMul0), m_Value(OtherMul1)))) {
		// Pattern: sqrt((x * y) * z)
		if (OtherMul0 == OtherMul1) {
		// Matched: sqrt((x * x) * z)
		RepeatOp = OtherMul0;
		OtherOp = Op1;
		}
		}
		}
		if (RepeatOp) {
		// Fast math flags for any created instructions should match the sqrt
		// and multiply.
		// FIXME: We're not checking the sqrt because it doesn't have
		// fast-math-flags (see earlier comment).
		IRBuilder<true, ConstantFolder,
		IRBuilderDefaultInserter<true> >::FastMathFlagGuard Guard(B);
		B.SetFastMathFlags(I->getFastMathFlags());
		// If we found a repeated factor, hoist it out of the square root and
		// replace it with the fabs of that factor.
		Module *M = Callee->getParent();
		Type *ArgType = Op->getType();
		Value *Fabs = Intrinsic::getDeclaration(M, Intrinsic::fabs, ArgType);
		Value *FabsCall = B.CreateCall(Fabs, RepeatOp, "fabs");
		if (OtherOp) {
		// If we found a non-repeated factor, we still need to get its square
		// root. We then multiply that by the value that was simplified out
		// of the square root calculation.
		Value *Sqrt = Intrinsic::getDeclaration(M, Intrinsic::sqrt, ArgType);
		Value *SqrtCall = B.CreateCall(Sqrt, OtherOp, "sqrt");
		return B.CreateFMul(FabsCall, SqrtCall);
		}
		return FabsCall;
		}
		}
		}
		return Ret;
		}

static bool isTrigLibCall(CallInst *CI);		static bool isTrigLibCall(CallInst *CI);
static void insertSinCosCall(IRBuilder<> &B, Function OrigCallee, Value Arg,		static void insertSinCosCall(IRBuilder<> &B, Function OrigCallee, Value Arg,
bool UseFloat, Value &Sin, Value &Cos,		bool UseFloat, Value &Sin, Value &Cos,
Value *&SinCos);		Value *&SinCos);

Value LibCallSimplifier::optimizeSinCosPi(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeSinCosPi(CallInst CI, IRBuilder<> &B) {

// Make sure the prototype is as expected, otherwise the rest of the		// Make sure the prototype is as expected, otherwise the rest of the
▲ Show 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	if (!isCallingConvC)
return nullptr;		return nullptr;
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::pow:		case Intrinsic::pow:
return optimizePow(CI, Builder);		return optimizePow(CI, Builder);
case Intrinsic::exp2:		case Intrinsic::exp2:
return optimizeExp2(CI, Builder);		return optimizeExp2(CI, Builder);
case Intrinsic::fabs:		case Intrinsic::fabs:
return optimizeFabs(CI, Builder);		return optimizeFabs(CI, Builder);
		case Intrinsic::sqrt:
		return optimizeSqrt(CI, Builder);
default:		default:
return nullptr;		return nullptr;
}		}
}		}

// Then check for known library functions.		// Then check for known library functions.
if (TLI->getLibFunc(FuncName, Func) && TLI->has(Func)) {		if (TLI->getLibFunc(FuncName, Func) && TLI->has(Func)) {
// We never change the calling convention.		// We never change the calling convention.
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (TLI->getLibFunc(FuncName, Func) && TLI->has(Func)) {
case LibFunc::exp2l:		case LibFunc::exp2l:
case LibFunc::exp2:		case LibFunc::exp2:
case LibFunc::exp2f:		case LibFunc::exp2f:
return optimizeExp2(CI, Builder);		return optimizeExp2(CI, Builder);
case LibFunc::fabsf:		case LibFunc::fabsf:
case LibFunc::fabs:		case LibFunc::fabs:
case LibFunc::fabsl:		case LibFunc::fabsl:
return optimizeFabs(CI, Builder);		return optimizeFabs(CI, Builder);
		case LibFunc::sqrtf:
		case LibFunc::sqrt:
		case LibFunc::sqrtl:
		return optimizeSqrt(CI, Builder);
case LibFunc::ffs:		case LibFunc::ffs:
case LibFunc::ffsl:		case LibFunc::ffsl:
case LibFunc::ffsll:		case LibFunc::ffsll:
return optimizeFFS(CI, Builder);		return optimizeFFS(CI, Builder);
case LibFunc::abs:		case LibFunc::abs:
case LibFunc::labs:		case LibFunc::labs:
case LibFunc::llabs:		case LibFunc::llabs:
return optimizeAbs(CI, Builder);		return optimizeAbs(CI, Builder);
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (TLI->getLibFunc(FuncName, Func) && TLI->has(Func)) {
case LibFunc::expm1:		case LibFunc::expm1:
case LibFunc::log:		case LibFunc::log:
case LibFunc::log10:		case LibFunc::log10:
case LibFunc::log1p:		case LibFunc::log1p:
case LibFunc::log2:		case LibFunc::log2:
case LibFunc::logb:		case LibFunc::logb:
case LibFunc::sin:		case LibFunc::sin:
case LibFunc::sinh:		case LibFunc::sinh:
case LibFunc::sqrt:
case LibFunc::tan:		case LibFunc::tan:
case LibFunc::tanh:		case LibFunc::tanh:
if (UnsafeFPShrink && hasFloatVersion(FuncName))		if (UnsafeFPShrink && hasFloatVersion(FuncName))
return optimizeUnaryDoubleFP(CI, Builder, true);		return optimizeUnaryDoubleFP(CI, Builder, true);
return nullptr;		return nullptr;
case LibFunc::fmin:		case LibFunc::fmin:
case LibFunc::fmax:		case LibFunc::fmax:
if (hasFloatVersion(FuncName))		if (hasFloatVersion(FuncName))
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/fast-math.ll

	Show First 20 Lines • Show All 524 Lines • ▼ Show 20 Lines
	define float @fact_div6(float %x) {			define float @fact_div6(float %x) {
	%t1 = fdiv fast float 0x3810000000000000, %x			%t1 = fdiv fast float 0x3810000000000000, %x
	%t2 = fdiv fast float 0x3800000000000000, %x			%t2 = fdiv fast float 0x3800000000000000, %x
	%t3 = fsub fast float %t1, %t2			%t3 = fsub fast float %t1, %t2
	ret float %t3			ret float %t3
	; CHECK: fact_div6			; CHECK: fact_div6
	; CHECK: %t3 = fsub fast float %t1, %t2			; CHECK: %t3 = fsub fast float %t1, %t2
	}			}

				; =========================================================================
				;
				; Test-cases for square root
				;
				; =========================================================================

				; A squared factor fed into a square root intrinsic should be hoisted out
				; as a fabs() value.
				; We have to rely on a function-level attribute to enable this optimization
				; because intrinsics don't currently have access to IR-level fast-math
				; flags. If that changes, we can relax the requirement on all of these
				; tests to just specify 'fast' on the sqrt.

				attributes #0 = { "unsafe-fp-math" = "true" }

				declare double @llvm.sqrt.f64(double)

				define double @sqrt_intrinsic_arg_squared(double %x) #0 {
				%mul = fmul fast double %x, %x
				%sqrt = call double @llvm.sqrt.f64(double %mul)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_arg_squared(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: ret double %fabs
				}

				; Check all 6 combinations of a 3-way multiplication tree where
				; one factor is repeated.

				define double @sqrt_intrinsic_three_args1(double %x, double %y) #0 {
				%mul = fmul fast double %y, %x
				%mul2 = fmul fast double %mul, %x
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_three_args1(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %y)
				; CHECK-NEXT: %1 = fmul fast double %fabs, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				define double @sqrt_intrinsic_three_args2(double %x, double %y) #0 {
				%mul = fmul fast double %x, %y
				%mul2 = fmul fast double %mul, %x
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_three_args2(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %y)
				; CHECK-NEXT: %1 = fmul fast double %fabs, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				define double @sqrt_intrinsic_three_args3(double %x, double %y) #0 {
				%mul = fmul fast double %x, %x
				%mul2 = fmul fast double %mul, %y
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_three_args3(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %y)
				; CHECK-NEXT: %1 = fmul fast double %fabs, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				define double @sqrt_intrinsic_three_args4(double %x, double %y) #0 {
				%mul = fmul fast double %y, %x
				%mul2 = fmul fast double %x, %mul
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_three_args4(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %y)
				; CHECK-NEXT: %1 = fmul fast double %fabs, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				define double @sqrt_intrinsic_three_args5(double %x, double %y) #0 {
				%mul = fmul fast double %x, %y
				%mul2 = fmul fast double %x, %mul
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_three_args5(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %y)
				; CHECK-NEXT: %1 = fmul fast double %fabs, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				define double @sqrt_intrinsic_three_args6(double %x, double %y) #0 {
				%mul = fmul fast double %x, %x
				%mul2 = fmul fast double %y, %mul
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_three_args6(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %y)
				; CHECK-NEXT: %1 = fmul fast double %fabs, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				define double @sqrt_intrinsic_arg_4th(double %x) #0 {
				%mul = fmul fast double %x, %x
				%mul2 = fmul fast double %mul, %mul
				%sqrt = call double @llvm.sqrt.f64(double %mul2)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_arg_4th(
				; CHECK-NEXT: %mul = fmul fast double %x, %x
				; CHECK-NEXT: ret double %mul
				}

				define double @sqrt_intrinsic_arg_5th(double %x) #0 {
				%mul = fmul fast double %x, %x
				%mul2 = fmul fast double %mul, %x
				%mul3 = fmul fast double %mul2, %mul
				%sqrt = call double @llvm.sqrt.f64(double %mul3)
				ret double %sqrt

				; CHECK-LABEL: sqrt_intrinsic_arg_5th(
				; CHECK-NEXT: %mul = fmul fast double %x, %x
				; CHECK-NEXT: %sqrt1 = call double @llvm.sqrt.f64(double %x)
				; CHECK-NEXT: %1 = fmul fast double %mul, %sqrt1
				; CHECK-NEXT: ret double %1
				}

				; Check that square root calls have the same behavior.

				declare float @sqrtf(float)
				declare double @sqrt(double)
				declare fp128 @sqrtl(fp128)

				define float @sqrt_call_squared_f32(float %x) #0 {
				%mul = fmul fast float %x, %x
				%sqrt = call float @sqrtf(float %mul)
				ret float %sqrt

				; CHECK-LABEL: sqrt_call_squared_f32(
				; CHECK-NEXT: %fabs = call float @llvm.fabs.f32(float %x)
				; CHECK-NEXT: ret float %fabs
				}

				define double @sqrt_call_squared_f64(double %x) #0 {
				%mul = fmul fast double %x, %x
				%sqrt = call double @sqrt(double %mul)
				ret double %sqrt

				; CHECK-LABEL: sqrt_call_squared_f64(
				; CHECK-NEXT: %fabs = call double @llvm.fabs.f64(double %x)
				; CHECK-NEXT: ret double %fabs
				}

				define fp128 @sqrt_call_squared_f128(fp128 %x) #0 {
				%mul = fmul fast fp128 %x, %x
				%sqrt = call fp128 @sqrtl(fp128 %mul)
				ret fp128 %sqrt

				; CHECK-LABEL: sqrt_call_squared_f128(
				; CHECK-NEXT: %fabs = call fp128 @llvm.fabs.f128(fp128 %x)
				; CHECK-NEXT: ret fp128 %fabs
				}

This is an archive of the discontinued LLVM Phabricator instance.

fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 15034

llvm/trunk/include/llvm/Transforms/Utils/SimplifyLibCalls.h

llvm/trunk/lib/Transforms/Utils/SimplifyLibCalls.cpp

llvm/trunk/test/Transforms/InstCombine/fast-math.ll

fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
ClosedPublic