This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
SimplifyLibCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
pow-4.ll
-
pow-sqrt.ll
-
sqrt.ll

Differential D129155

[InstCombine][SimplifyLibCalls] convert sqrt libcalls with "nnan" to sqrt intrinsics
AbandonedPublic

Authored by spatel on Jul 5 2022, 11:00 AM.

Download Raw Diff

Details

Reviewers

fhahn
efriedma
nikic

Summary

If a sqrt call has "nnan", that implies that the input argument is never negative because sqrt of {negative number} --> NAN.
If the argument is never negative, then we can assume that errno is not written, so the call can be translated to the LLVM intrinsic.

This affects codegen for targets like x86 that have a sqrt instruction, but we still have to conservatively assume that a libcall may be needed to set errno as shown in issue #52620.

This patch won't solve that example - we will need to extend this to use CannotBeOrderedLessThanZero or similar, enhance that analysis for new operators, and deal with llvm.assume too.

Diff Detail

Event Timeline

spatel created this revision.Jul 5 2022, 11:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2022, 11:00 AM

Herald added subscribers: jsji, pengfei, hiraditya, mcrosier. · View Herald Transcript

spatel requested review of this revision.Jul 5 2022, 11:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2022, 11:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B173742: Diff 442353.Jul 5 2022, 11:01 AM

I hate trying to reason about this stuff.

The intrinsics for sqrt/sin/cos/pow/exp/exp2/log/log2/log10 are theoretically supposed to not write to errno, but in practice they do because we call the libc implementations. So even though this is safe in theory, in practice it could cause a miscompile (if we hoist a sqrt operation past a load from errno). Not sure what, if anything, we should do about this.

In D129155#3631251, @efriedma wrote:

I hate trying to reason about this stuff.

The intrinsics for sqrt/sin/cos/pow/exp/exp2/log/log2/log10 are theoretically supposed to not write to errno, but in practice they do because we call the libc implementations. So even though this is safe in theory, in practice it could cause a miscompile (if we hoist a sqrt operation past a load from errno). Not sure what, if anything, we should do about this.

Ah, I didn't think of the sqrt call moving relative to other calls/accesses of errno. Are we already exposed to that problem in a bigger way when the front-end creates intrinsics too?
As an alternative, we could put this kind of check into PartiallyInlineLibCalls - that's where we create the extra call (that we then expect to be lowered to an instruction) after checking TTI->haveFastSqrt(). Or since we have TTI in AggressiveInstCombine, we could do the transform there. Do you see any potential miscompiles if we use that hook to guard the transform?

Are we already exposed to that problem in a bigger way when the front-end creates intrinsics too?

Sort of. clang generally won't create calls to llvm.sqrt if -fmath-errno is enabled (which is the default on targets where sqrt sets errno). clang will create intrinsics with -ffast-math, though. And other frontends probably just blindly call the intrinsic.

Do you see any potential miscompiles if we use that hook to guard the transform?

That's probably safe enough.

spatel mentioned this in D129167: [AggressiveInstCombine] convert sqrt libcalls with "nnan" to sqrt intrinsics.Jul 5 2022, 7:22 PM

Abandoning for the alternative that uses TTI via aggressive-instcombine ( D129167 ).

spatel mentioned this in rGe3205b87655f: [AggressiveInstCombine] convert sqrt libcalls with "nnan" to sqrt intrinsics.Jul 26 2022, 12:50 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

SimplifyLibCalls.cpp

23 lines

test/

Transforms/

InstCombine/

pow-4.ll

18 lines

pow-sqrt.ll

8 lines

sqrt.ll

12 lines

Diff 442353

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 2,212 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizeSqrt(CallInst CI, IRBuilderBase &B) {
// TODO: Once we have a way (other than checking for the existince of the		// TODO: Once we have a way (other than checking for the existince of the
// libcall) to tell whether our target can lower @llvm.sqrt, relax the		// libcall) to tell whether our target can lower @llvm.sqrt, relax the
// condition below.		// condition below.
if (isLibFuncEmittable(M, TLI, LibFunc_sqrtf) &&		if (isLibFuncEmittable(M, TLI, LibFunc_sqrtf) &&
(Callee->getName() == "sqrt" \|\|		(Callee->getName() == "sqrt" \|\|
Callee->getIntrinsicID() == Intrinsic::sqrt))		Callee->getIntrinsicID() == Intrinsic::sqrt))
Ret = optimizeUnaryDoubleFP(CI, B, TLI, true);		Ret = optimizeUnaryDoubleFP(CI, B, TLI, true);

		// Fast-math-flags for any created instructions should match the sqrt.
		IRBuilderBase::FastMathFlagGuard Guard(B);
		B.setFastMathFlags(CI->getFastMathFlags());

		// If this is a sqrt libcall and we can assume that NAN is not created, then
		// the arg must not be less than -0.0 and errno won't be set either.
		// It is safe to convert this to an intrinsic call.
		// TODO: Check if the arg is known non-negative.
		Value *Arg = CI->getArgOperand(0);
		Type *ArgType = Arg->getType();
		if (!Callee->isIntrinsic() && CI->hasNoNaNs()) {
		Function *Sqrt = Intrinsic::getDeclaration(M, Intrinsic::sqrt, ArgType);
		return copyFlags(*CI, B.CreateCall(Sqrt, Arg, "sqrt"));
		}

if (!CI->isFast())		if (!CI->isFast())
return Ret;		return Ret;

Instruction *I = dyn_cast<Instruction>(CI->getArgOperand(0));		Instruction *I = dyn_cast<Instruction>(Arg);
if (!I \|\| I->getOpcode() != Instruction::FMul \|\| !I->isFast())		if (!I \|\| I->getOpcode() != Instruction::FMul \|\| !I->isFast())
return Ret;		return Ret;

// We're looking for a repeated factor in a multiplication tree,		// We're looking for a repeated factor in a multiplication tree,
// so we can do this fold: sqrt(x * x) -> fabs(x);		// so we can do this fold: sqrt(x * x) -> fabs(x);
// or this fold: sqrt((x * x) * y) -> fabs(x) * sqrt(y).		// or this fold: sqrt((x * x) * y) -> fabs(x) * sqrt(y).
Value *Op0 = I->getOperand(0);		Value *Op0 = I->getOperand(0);
Value *Op1 = I->getOperand(1);		Value *Op1 = I->getOperand(1);
Show All 16 Lines	if (match(Op0, m_FMul(m_Value(OtherMul0), m_Value(OtherMul1)))) {
RepeatOp = OtherMul0;		RepeatOp = OtherMul0;
OtherOp = Op1;		OtherOp = Op1;
}		}
}		}
}		}
if (!RepeatOp)		if (!RepeatOp)
return Ret;		return Ret;

// Fast math flags for any created instructions should match the sqrt
// and multiply.
IRBuilderBase::FastMathFlagGuard Guard(B);
B.setFastMathFlags(I->getFastMathFlags());

// If we found a repeated factor, hoist it out of the square root and		// If we found a repeated factor, hoist it out of the square root and
// replace it with the fabs of that factor.		// replace it with the fabs of that factor.
Type *ArgType = I->getType();
Function *Fabs = Intrinsic::getDeclaration(M, Intrinsic::fabs, ArgType);		Function *Fabs = Intrinsic::getDeclaration(M, Intrinsic::fabs, ArgType);
Value *FabsCall = B.CreateCall(Fabs, RepeatOp, "fabs");		Value *FabsCall = B.CreateCall(Fabs, RepeatOp, "fabs");
if (OtherOp) {		if (OtherOp) {
// If we found a non-repeated factor, we still need to get its square		// If we found a non-repeated factor, we still need to get its square
// root. We then multiply that by the value that was simplified out		// root. We then multiply that by the value that was simplified out
// of the square root calculation.		// of the square root calculation.
Function *Sqrt = Intrinsic::getDeclaration(M, Intrinsic::sqrt, ArgType);		Function *Sqrt = Intrinsic::getDeclaration(M, Intrinsic::sqrt, ArgType);
Value *SqrtCall = B.CreateCall(Sqrt, OtherOp, "sqrt");		Value *SqrtCall = B.CreateCall(Sqrt, OtherOp, "sqrt");
▲ Show 20 Lines • Show All 1,524 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pow-4.ll

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	; SQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]			; SQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
	; SQRT-NEXT: ret double [[TMP4]]			; SQRT-NEXT: ret double [[TMP4]]
	;			;
	; NOSQRT-LABEL: @test_simplify_16_5_libcall(			; NOSQRT-LABEL: @test_simplify_16_5_libcall(
	; NOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 1.650000e+01)			; NOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 1.650000e+01)
	; NOSQRT-NEXT: ret double [[TMP1]]			; NOSQRT-NEXT: ret double [[TMP1]]
	;			;
	; CHECKSQRT-LABEL: @test_simplify_16_5_libcall(			; CHECKSQRT-LABEL: @test_simplify_16_5_libcall(
	; CHECKSQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])			; CHECKSQRT-NEXT: [[SQRT1:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
	; CHECKSQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]			; CHECKSQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]
	; CHECKSQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]			; CHECKSQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
	; CHECKSQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]			; CHECKSQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
	; CHECKSQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]			; CHECKSQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
	; CHECKSQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]			; CHECKSQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT1]]
	; CHECKSQRT-NEXT: ret double [[TMP4]]			; CHECKSQRT-NEXT: ret double [[TMP4]]
	;			;
	; CHECKNOSQRT-LABEL: @test_simplify_16_5_libcall(			; CHECKNOSQRT-LABEL: @test_simplify_16_5_libcall(
	; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 1.650000e+01)			; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 1.650000e+01)
	; CHECKNOSQRT-NEXT: ret double [[TMP1]]			; CHECKNOSQRT-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @pow(double %x, double 1.650000e+01)			%1 = call fast double @pow(double %x, double 1.650000e+01)
	ret double %1			ret double %1
	Show All 12 Lines
	; SQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]			; SQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]
	; SQRT-NEXT: ret double [[RECIPROCAL]]			; SQRT-NEXT: ret double [[RECIPROCAL]]
	;			;
	; NOSQRT-LABEL: @test_simplify_neg_16_5_libcall(			; NOSQRT-LABEL: @test_simplify_neg_16_5_libcall(
	; NOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -1.650000e+01)			; NOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -1.650000e+01)
	; NOSQRT-NEXT: ret double [[TMP1]]			; NOSQRT-NEXT: ret double [[TMP1]]
	;			;
	; CHECKSQRT-LABEL: @test_simplify_neg_16_5_libcall(			; CHECKSQRT-LABEL: @test_simplify_neg_16_5_libcall(
	; CHECKSQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])			; CHECKSQRT-NEXT: [[SQRT1:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
	; CHECKSQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]			; CHECKSQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]
	; CHECKSQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]			; CHECKSQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
	; CHECKSQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]			; CHECKSQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
	; CHECKSQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]			; CHECKSQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
	; CHECKSQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]			; CHECKSQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT1]]
	; CHECKSQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]			; CHECKSQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]
	; CHECKSQRT-NEXT: ret double [[RECIPROCAL]]			; CHECKSQRT-NEXT: ret double [[RECIPROCAL]]
	;			;
	; CHECKNOSQRT-LABEL: @test_simplify_neg_16_5_libcall(			; CHECKNOSQRT-LABEL: @test_simplify_neg_16_5_libcall(
	; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -1.650000e+01)			; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -1.650000e+01)
	; CHECKNOSQRT-NEXT: ret double [[TMP1]]			; CHECKNOSQRT-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @pow(double %x, double -1.650000e+01)			%1 = call fast double @pow(double %x, double -1.650000e+01)
	Show All 39 Lines
	; CHECK-NEXT: ret <4 x float> [[TMP2]]			; CHECK-NEXT: ret <4 x float> [[TMP2]]
	;			;
	%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 3.500000e+00, float 3.500000e+00, float 3.500000e+00, float 3.500000e+00>)			%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 3.500000e+00, float 3.500000e+00, float 3.500000e+00, float 3.500000e+00>)
	ret <4 x float> %1			ret <4 x float> %1
	}			}

	; (float)pow((double)(float)x, 0.5)			; (float)pow((double)(float)x, 0.5)
	define float @shrink_pow_libcall_half(float %x) {			define float @shrink_pow_libcall_half(float %x) {
	; CHECK-LABEL: @shrink_pow_libcall_half(			; CHECKSQRT-LABEL: @shrink_pow_libcall_half(
	; CHECK-NEXT: [[SQRTF:%.]] = call fast float @sqrtf(float [[X:%.]])			; CHECKSQRT-NEXT: [[TMP1:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
	; CHECK-NEXT: ret float [[SQRTF]]			; CHECKSQRT-NEXT: ret float [[TMP1]]
				;
				; CHECKNOSQRT-LABEL: @shrink_pow_libcall_half(
				; CHECKNOSQRT-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
				; CHECKNOSQRT-NEXT: ret float [[SQRT]]
	;			;
	%dx = fpext float %x to double			%dx = fpext float %x to double
	%call = call fast double @pow(double %dx, double 0.5)			%call = call fast double @pow(double %dx, double 0.5)
	%fr = fptrunc double %call to float			%fr = fptrunc double %call to float
	ret float %fr			ret float %fr
	}			}

	; Make sure that -0.0 exponent is always simplified.			; Make sure that -0.0 exponent is always simplified.
	Show All 9 Lines

llvm/test/Transforms/InstCombine/pow-sqrt.ll

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	;
%pow = call ninf nsz double @llvm.pow.f64(double %x, double 5.0e-01)		%pow = call ninf nsz double @llvm.pow.f64(double %x, double 5.0e-01)
ret double %pow		ret double %pow
}		}

; Overspecified FMF to test propagation to the new op(s).		; Overspecified FMF to test propagation to the new op(s).

define float @pow_libcall_half_fast(float %x) {		define float @pow_libcall_half_fast(float %x) {
; CHECK-LABEL: @pow_libcall_half_fast(		; CHECK-LABEL: @pow_libcall_half_fast(
; CHECK-NEXT: [[SQRTF:%.]] = call fast float @sqrtf(float [[X:%.]])		; CHECK-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
; CHECK-NEXT: ret float [[SQRTF]]		; CHECK-NEXT: ret float [[SQRT]]
;		;
%pow = call fast float @powf(float %x, float 5.0e-01)		%pow = call fast float @powf(float %x, float 5.0e-01)
ret float %pow		ret float %pow
}		}

define double @pow_intrinsic_half_fast(double %x) {		define double @pow_intrinsic_half_fast(double %x) {
; CHECK-LABEL: @pow_intrinsic_half_fast(		; CHECK-LABEL: @pow_intrinsic_half_fast(
; CHECK-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])		; CHECK-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	;
%pow = call afn ninf nsz float @powf(float %x, float -5.0e-01)		%pow = call afn ninf nsz float @powf(float %x, float -5.0e-01)
ret float %pow		ret float %pow
}		}

; Overspecified FMF to test propagation to the new op(s).		; Overspecified FMF to test propagation to the new op(s).

define float @pow_libcall_neghalf_fast(float %x) {		define float @pow_libcall_neghalf_fast(float %x) {
; CHECK-LABEL: @pow_libcall_neghalf_fast(		; CHECK-LABEL: @pow_libcall_neghalf_fast(
; CHECK-NEXT: [[SQRTF:%.]] = call fast float @sqrtf(float [[X:%.]])		; CHECK-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast float 1.000000e+00, [[SQRTF]]		; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast float 1.000000e+00, [[SQRT]]
; CHECK-NEXT: ret float [[RECIPROCAL]]		; CHECK-NEXT: ret float [[RECIPROCAL]]
;		;
%pow = call fast float @powf(float %x, float -5.0e-01)		%pow = call fast float @powf(float %x, float -5.0e-01)
ret float %pow		ret float %pow
}		}

define double @pow_intrinsic_neghalf_fast(double %x) {		define double @pow_intrinsic_neghalf_fast(double %x) {
; CHECK-LABEL: @pow_intrinsic_neghalf_fast(		; CHECK-LABEL: @pow_intrinsic_neghalf_fast(
Show All 18 Lines

llvm/test/Transforms/InstCombine/sqrt.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[SQRTF:%.]] = call float @sqrtf(float [[F:%.]]) #[[ATTR2:[0-9]+]]			; CHECK-NEXT: [[SQRTF:%.]] = call float @sqrtf(float [[F:%.]]) #[[ATTR2:[0-9]+]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%d = fpext float %f to double			%d = fpext float %f to double
	%r = call double @sqrt(double %d)			%r = call double @sqrt(double %d)
	ret void			ret void
	}			}

				; nnan implies no setting of errno, so transform to an intrinsic

	define float @sqrt_call_nnan_f32(float %x) {			define float @sqrt_call_nnan_f32(float %x) {
	; CHECK-LABEL: @sqrt_call_nnan_f32(			; CHECK-LABEL: @sqrt_call_nnan_f32(
	; CHECK-NEXT: [[SQRT:%.]] = call nnan float @sqrtf(float [[X:%.]])			; CHECK-NEXT: [[SQRT1:%.]] = call nnan float @llvm.sqrt.f32(float [[X:%.]])
	; CHECK-NEXT: ret float [[SQRT]]			; CHECK-NEXT: ret float [[SQRT1]]
	;			;
	%sqrt = call nnan float @sqrtf(float %x)			%sqrt = call nnan float @sqrtf(float %x)
	ret float %sqrt			ret float %sqrt
	}			}

				; verify that other function call FMF and attributes are propagated to the intrinsic call

	define double @sqrt_call_nnan_f64(double %x) {			define double @sqrt_call_nnan_f64(double %x) {
	; CHECK-LABEL: @sqrt_call_nnan_f64(			; CHECK-LABEL: @sqrt_call_nnan_f64(
	; CHECK-NEXT: [[SQRT:%.]] = tail call nnan ninf double @sqrt(double [[X:%.]])			; CHECK-NEXT: [[SQRT1:%.]] = tail call nnan ninf double @llvm.sqrt.f64(double [[X:%.]])
	; CHECK-NEXT: ret double [[SQRT]]			; CHECK-NEXT: ret double [[SQRT1]]
	;			;
	%sqrt = tail call nnan ninf double @sqrt(double %x)			%sqrt = tail call nnan ninf double @sqrt(double %x)
	ret double %sqrt			ret double %sqrt
	}			}

	define float @sqrt_call_fabs_f32(float %x) {			define float @sqrt_call_fabs_f32(float %x) {
	; CHECK-LABEL: @sqrt_call_fabs_f32(			; CHECK-LABEL: @sqrt_call_fabs_f32(
	; CHECK-NEXT: [[A:%.]] = call float @llvm.fabs.f32(float [[X:%.]])			; CHECK-NEXT: [[A:%.]] = call float @llvm.fabs.f32(float [[X:%.]])
	Show All 12 Lines