This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ValueTracking.h
-
lib/Analysis/
-
Analysis/
8
ValueTracking.cpp
-
VectorUtils.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1
vector-sqrt.ll

Differential D27618

Failure to vectorize __builtin_sqrt/__builtin_sqrtf
AbandonedPublic

Authored by avt77 on Dec 9 2016, 7:04 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
eli.friedman

Summary

This review should resolve Bug 27435 - [X86][SSE] Failure to vectorize builtin_sqrt/builtin_sqrtf. All details you could find here: PR27435.
The problem was found in C-file compiling that's why I added C-test - maybe I'm wrong?
The fix is really tiny: I simply removed the check which should be done in another place if it's needed.

Diff Detail

Event Timeline

avt77 updated this revision to Diff 80896.Dec 9 2016, 7:04 AM

avt77 retitled this revision from to Failure to vectorize __builtin_sqrt/__builtin_sqrtf.

avt77 updated this object.

avt77 added reviewers: RKSimon, spatel, ABataev.

It would be great the have a ll IR test case. You can use the -S -emit-llvm options to generate llvm IR for the test case.

In D27618#618796, @fhahn wrote:

It would be great the have a ll IR test case. You can use the -S -emit-llvm options to generate llvm IR for the test case.

In fact we should only have ll tests - depending on clang in the llvm project is a big no-no. Move the .c test over to clang/test/Codegen, checking the generated IR in that file; then have an equivalent test in llvm that tests the codegen for exactly that IR. You can add the clang test without review.

lib/Analysis/ValueTracking.cpp
2529	Do we know the history behind why sqrt was protected by this?
test/CodeGen/X86/lit.local.cfg
7 ↗	(On Diff #80896)	c/cpp test files aren't permitted in the llvm project

hfinkel added a subscriber: hfinkel.Dec 10 2016, 6:49 AM

hfinkel added inline comments.

lib/Analysis/ValueTracking.cpp
2529	The problem, which has come up a lot, is that our sqrt intrinsic is a special case. Unlike the other intrinsics for math functions, it does not exactly mirror the behavior of the corresponding libm function. As noted in the LangRef, "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0." Thus, unless we get to assume no NaNs, it is possible that the intrinsic might have UB in cases where the libm function does not, and as a result, we can't substitute the intrinsic for the libm call.

fhahn added inline comments.Dec 11 2016, 5:42 AM

lib/Analysis/ValueTracking.cpp
2529	I think the checks were added in https://reviews.llvm.org/rL265521 and I think the reasoning behind the check is still sound. LibFun::sqrt and Intrinsic::sqrt differ in the way they treat negative elements, the intrinsic has undefined behavior for negative numbers (http://llvm.org/docs/LangRef.html#llvm-sqrt-intrinsic).

RKSimon added inline comments.Dec 11 2016, 9:38 AM

lib/Analysis/ValueTracking.cpp
2529	What do you think is the best approach going forward? Is there a way to detect when the target lib func will in fact lower to an instruction (as per SSE sqrtss/sqrtsd) and will handle nan correctly? Failing that it starting to sound like we should just do a fixup vectorization during x86 lowering of a build_vector of sqrt ops as I suggested in https://llvm.org/bugs/show_bug.cgi?id=27435#c3

hfinkel added inline comments.Dec 12 2016, 1:33 AM

lib/Analysis/ValueTracking.cpp
2529	What do you think is the best approach going forward? Is there a way to detect when the target lib func will in fact lower to an instruction (as per SSE sqrtss/sqrtsd) and will handle nan correctly? We can't have target-dependent semantics for a target-independent IR-level intrinsic. I'd prefer that we somehow eliminate the IR-level special case here, either by adding a new intrinsic to represent sqrt with the regular semantics or add some argument to the existing intrinsic to indicate that all negative numbers, etc. are supported. A second intrinsic is probably easier. The fact that we have an intrinsic corresponding to the libm functions we handle except for sqrt, for which this is only true in no-NaNs mode, causes all sorts of inconveniences (and, as in this case, missed opportunities).

I restored the check if (ICS->hasNoNaNs()) but for non-vector operations only because vector instructions work correctly with invalid input values. To make it possible I changed the signature of llvm::getIntrinsicForCallSite. Now it knows the required intrinsic target.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptDec 19 2016, 3:46 AM

RKSimon added a reviewer: eli.friedman.Jan 5 2017, 3:30 AM

RKSimon added inline comments.Jan 6 2017, 6:09 AM

lib/Analysis/ValueTracking.cpp
2529	I don't think we can use an isvector test to determine if its safe to replace a sqrt libcall with a llvm.sqrt intrinsic.
test/CodeGen/X86/vector-sqrt.ll
3	This isn't testing your patch - surely you need a tests in somewhere like llvm\test\Transforms\SLPVectorizer\X86\sqrt.ll testing sqrt/sqrtf libcalls as well as the llvm.sqrt intrinsics? Testing at the IR level not asm codegen.

avt77 added inline comments.Jan 9 2017, 1:35 AM

lib/Analysis/ValueTracking.cpp
2529	Does it mean you insist on a new intrinsic something like here: if (ICS->hasNoNaNs()) return Intrinsic::sqrt; else return Intrinsic:sqrtWithoutNoNaNs(); This new intrinsic should check the real argument and if it's NaNs it should call stdlib otherwise it should call Intrinsic:sqrt. Right?

RKSimon mentioned this in D28335: [WIP] [RFC] Don't lower floating point intrinsics to libcalls which modify errno.Jan 9 2017, 6:07 AM

hfinkel added inline comments.Jan 9 2017, 9:38 AM

lib/Analysis/ValueTracking.cpp
2529	Yes, I think that, if we are going to use this intrinsic canonicalization in this part of the code, then we need a separate representation for the "regular" sqrt as opposed to our special NoNaNs sqrt. Or we should just fix our current sqrt intrinsic (as Eli recently proposed in an RFC). Since we now can add nnan to calls, we don't seem to need the special semantics even to preserve the same representational capability.

ABataev resigned from this revision.Feb 13 2017, 11:41 AM

spatel mentioned this in rL316250: [CodeGen] add tests for __builtin_sqrt*; NFC.Oct 20 2017, 4:35 PM

Can we abandon this now? PR27435 was closed with D39642 / rL317519. The steps needed to fix the general problem with errno are listed in D28335.

As spatel wrote : "PR27435 was closed with D39642 / rL317519. The steps needed to fix the general problem with errno are listed in D28335."

Revision Contents

Path

Size

include/

llvm/

Analysis/

ValueTracking.h

3 lines

lib/

Analysis/

ValueTracking.cpp

5 lines

VectorUtils.cpp

2 lines

test/

CodeGen/

X86/

vector-sqrt.ll

65 lines

Diff 81934

include/llvm/Analysis/ValueTracking.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	template <typename T> class ArrayRef;
/// if LookThroughSExt=true.		/// if LookThroughSExt=true.
bool ComputeMultiple(Value V, unsigned Base, Value &Multiple,		bool ComputeMultiple(Value V, unsigned Base, Value &Multiple,
bool LookThroughSExt = false,		bool LookThroughSExt = false,
unsigned Depth = 0);		unsigned Depth = 0);

/// Map a call instruction to an intrinsic ID. Libcalls which have equivalent		/// Map a call instruction to an intrinsic ID. Libcalls which have equivalent
/// intrinsics are treated as-if they were intrinsics.		/// intrinsics are treated as-if they were intrinsics.
Intrinsic::ID getIntrinsicForCallSite(ImmutableCallSite ICS,		Intrinsic::ID getIntrinsicForCallSite(ImmutableCallSite ICS,
const TargetLibraryInfo *TLI);		const TargetLibraryInfo *TLI,
		bool forVector = false);

/// Return true if we can prove that the specified FP value is never equal to		/// Return true if we can prove that the specified FP value is never equal to
/// -0.0.		/// -0.0.
bool CannotBeNegativeZero(const Value V, const TargetLibraryInfo TLI,		bool CannotBeNegativeZero(const Value V, const TargetLibraryInfo TLI,
unsigned Depth = 0);		unsigned Depth = 0);

/// Return true if we can prove that the specified FP value is either a NaN or		/// Return true if we can prove that the specified FP value is either a NaN or
/// never less than 0.0.		/// never less than 0.0.
▲ Show 20 Lines • Show All 312 Lines • Show Last 20 Lines

lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 2,416 Lines • ▼ Show 20 Lines	bool llvm::ComputeMultiple(Value V, unsigned Base, Value &Multiple,
}		}
}		}

// We could not determine if V is a multiple of Base.		// We could not determine if V is a multiple of Base.
return false;		return false;
}		}

Intrinsic::ID llvm::getIntrinsicForCallSite(ImmutableCallSite ICS,		Intrinsic::ID llvm::getIntrinsicForCallSite(ImmutableCallSite ICS,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		bool forVector) {
const Function *F = ICS.getCalledFunction();		const Function *F = ICS.getCalledFunction();
if (!F)		if (!F)
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;

if (F->isIntrinsic())		if (F->isIntrinsic())
return F->getIntrinsicID();		return F->getIntrinsicID();

if (!TLI)		if (!TLI)
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	case LibFunc::roundl:
return Intrinsic::round;		return Intrinsic::round;
case LibFunc::pow:		case LibFunc::pow:
case LibFunc::powf:		case LibFunc::powf:
case LibFunc::powl:		case LibFunc::powl:
return Intrinsic::pow;		return Intrinsic::pow;
case LibFunc::sqrt:		case LibFunc::sqrt:
case LibFunc::sqrtf:		case LibFunc::sqrtf:
case LibFunc::sqrtl:		case LibFunc::sqrtl:
if (ICS->hasNoNaNs())		if (ICS->hasNoNaNs() \|\| forVector)
return Intrinsic::sqrt;		return Intrinsic::sqrt;
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;
		RKSimonUnsubmitted Not Done Reply Inline Actions Do we know the history behind why sqrt was protected by this? RKSimon: Do we know the history behind why sqrt was protected by this?
		hfinkelUnsubmitted Not Done Reply Inline Actions The problem, which has come up a lot, is that our sqrt intrinsic is a special case. Unlike the other intrinsics for math functions, it does not exactly mirror the behavior of the corresponding libm function. As noted in the LangRef, "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0." Thus, unless we get to assume no NaNs, it is possible that the intrinsic might have UB in cases where the libm function does not, and as a result, we can't substitute the intrinsic for the libm call. hfinkel: The problem, which has come up a lot, is that our sqrt intrinsic is a special case. Unlike the…
		fhahnUnsubmitted Not Done Reply Inline Actions I think the checks were added in https://reviews.llvm.org/rL265521 and I think the reasoning behind the check is still sound. LibFun::sqrt and Intrinsic::sqrt differ in the way they treat negative elements, the intrinsic has undefined behavior for negative numbers (http://llvm.org/docs/LangRef.html#llvm-sqrt-intrinsic). fhahn: I think the checks were added in https://reviews.llvm.org/rL265521 and I think the reasoning…
		RKSimonUnsubmitted Not Done Reply Inline Actions What do you think is the best approach going forward? Is there a way to detect when the target lib func will in fact lower to an instruction (as per SSE sqrtss/sqrtsd) and will handle nan correctly? Failing that it starting to sound like we should just do a fixup vectorization during x86 lowering of a build_vector of sqrt ops as I suggested in https://llvm.org/bugs/show_bug.cgi?id=27435#c3 RKSimon: What do you think is the best approach going forward? Is there a way to detect when the target…
		hfinkelUnsubmitted Not Done Reply Inline Actions What do you think is the best approach going forward? Is there a way to detect when the target lib func will in fact lower to an instruction (as per SSE sqrtss/sqrtsd) and will handle nan correctly? We can't have target-dependent semantics for a target-independent IR-level intrinsic. I'd prefer that we somehow eliminate the IR-level special case here, either by adding a new intrinsic to represent sqrt with the regular semantics or add some argument to the existing intrinsic to indicate that all negative numbers, etc. are supported. A second intrinsic is probably easier. The fact that we have an intrinsic corresponding to the libm functions we handle except for sqrt, for which this is only true in no-NaNs mode, causes all sorts of inconveniences (and, as in this case, missed opportunities). hfinkel: > What do you think is the best approach going forward? Is there a way to detect when the…
		RKSimonUnsubmitted Not Done Reply Inline Actions I don't think we can use an isvector test to determine if its safe to replace a sqrt libcall with a llvm.sqrt intrinsic. RKSimon: I don't think we can use an isvector test to determine if its safe to replace a sqrt libcall…
		avt77AuthorUnsubmitted Not Done Reply Inline Actions Does it mean you insist on a new intrinsic something like here: if (ICS->hasNoNaNs()) return Intrinsic::sqrt; else return Intrinsic:sqrtWithoutNoNaNs(); This new intrinsic should check the real argument and if it's NaNs it should call stdlib otherwise it should call Intrinsic:sqrt. Right? avt77: Does it mean you insist on a new intrinsic something like here: if (ICS->hasNoNaNs())…
		hfinkelUnsubmitted Not Done Reply Inline Actions Yes, I think that, if we are going to use this intrinsic canonicalization in this part of the code, then we need a separate representation for the "regular" sqrt as opposed to our special NoNaNs sqrt. Or we should just fix our current sqrt intrinsic (as Eli recently proposed in an RFC). Since we now can add nnan to calls, we don't seem to need the special semantics even to preserve the same representational capability. hfinkel: Yes, I think that, if we are going to use this intrinsic canonicalization in this part of the…
}		}

return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;
}		}

/// Return true if we can prove that the specified FP value is never equal to		/// Return true if we can prove that the specified FP value is never equal to
/// -0.0.		/// -0.0.
///		///
▲ Show 20 Lines • Show All 1,821 Lines • Show Last 20 Lines

lib/Analysis/VectorUtils.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	bool llvm::hasVectorInstrinsicScalarOpd(Intrinsic::ID ID,
}		}
}		}

/// \brief Returns intrinsic ID for call.		/// \brief Returns intrinsic ID for call.
/// For the input call instruction it finds mapping intrinsic and returns		/// For the input call instruction it finds mapping intrinsic and returns
/// its ID, in case it does not found it return not_intrinsic.		/// its ID, in case it does not found it return not_intrinsic.
Intrinsic::ID llvm::getVectorIntrinsicIDForCall(const CallInst *CI,		Intrinsic::ID llvm::getVectorIntrinsicIDForCall(const CallInst *CI,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
Intrinsic::ID ID = getIntrinsicForCallSite(CI, TLI);		Intrinsic::ID ID = getIntrinsicForCallSite(CI, TLI, /forVector/ true);
if (ID == Intrinsic::not_intrinsic)		if (ID == Intrinsic::not_intrinsic)
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;

if (isTriviallyVectorizable(ID) \|\| ID == Intrinsic::lifetime_start \|\|		if (isTriviallyVectorizable(ID) \|\| ID == Intrinsic::lifetime_start \|\|
ID == Intrinsic::lifetime_end \|\| ID == Intrinsic::assume)		ID == Intrinsic::lifetime_end \|\| ID == Intrinsic::assume)
return ID;		return ID;
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;
}		}
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-sqrt.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=CHECK			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=CHECK

				RKSimonUnsubmitted Not Done Reply Inline Actions This isn't testing your patch - surely you need a tests in somewhere like llvm\test\Transforms\SLPVectorizer\X86\sqrt.ll testing sqrt/sqrtf libcalls as well as the llvm.sqrt intrinsics? Testing at the IR level not asm codegen. RKSimon: This isn't testing your patch - surely you need a tests in somewhere like…
	; Function Attrs: nounwind readonly uwtable
	define <2 x double> @sqrtd2(double* nocapture readonly %v) local_unnamed_addr #0 {			define <2 x double> @sqrtd2(double* nocapture readonly %v) local_unnamed_addr #0 {
	; CHECK-LABEL: sqrtd2:			; CHECK-LABEL: sqrtd2:
	; CHECK: vsqrtsd (%rdi), %xmm0, %xmm0			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: vsqrtsd 8(%rdi), %xmm1, %xmm1			; CHECK-NEXT: vsqrtpd (%rdi), %xmm0
	; CHECK-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = load double, double* %v, align 8			%0 = bitcast double* %v to <2 x double>*
	%call = tail call double @sqrt(double %0) #2			%1 = load <2 x double>, <2 x double>* %0, align 8
	%arrayidx1 = getelementptr inbounds double, double* %v, i64 1			%2 = call <2 x double> @llvm.sqrt.v2f64(<2 x double> %1)
	%1 = load double, double* %arrayidx1, align 8			ret <2 x double> %2
	%call2 = tail call double @sqrt(double %1) #2
	%vecinit.i = insertelement <2 x double> undef, double %call, i32 0
	%vecinit1.i = insertelement <2 x double> %vecinit.i, double %call2, i32 1
	ret <2 x double> %vecinit1.i
	}			}

	; Function Attrs: nounwind readnone
	declare double @sqrt(double) local_unnamed_addr #1

	; Function Attrs: nounwind readonly uwtable
	define <4 x float> @sqrtf4(float* nocapture readonly %v) local_unnamed_addr #0 {			define <4 x float> @sqrtf4(float* nocapture readonly %v) local_unnamed_addr #0 {
	; CHECK-LABEL: sqrtf4:			; CHECK-LABEL: sqrtf4:
	; CHECK: vsqrtss (%rdi), %xmm0, %xmm0			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: vsqrtss 4(%rdi), %xmm1, %xmm1			; CHECK-NEXT: vsqrtps (%rdi), %xmm0
	; CHECK-NEXT: vsqrtss 8(%rdi), %xmm2, %xmm2
	; CHECK-NEXT: vsqrtss 12(%rdi), %xmm3, %xmm3
	; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm2[0],xmm0[3]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm3[0]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = load float, float* %v, align 4			%0 = bitcast float* %v to <4 x float>*
	%call = tail call float @sqrtf(float %0) #2			%1 = load <4 x float>, <4 x float>* %0, align 4
	%arrayidx1 = getelementptr inbounds float, float* %v, i64 1			%2 = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %1)
	%1 = load float, float* %arrayidx1, align 4			ret <4 x float> %2
	%call2 = tail call float @sqrtf(float %1) #2
	%arrayidx3 = getelementptr inbounds float, float* %v, i64 2
	%2 = load float, float* %arrayidx3, align 4
	%call4 = tail call float @sqrtf(float %2) #2
	%arrayidx5 = getelementptr inbounds float, float* %v, i64 3
	%3 = load float, float* %arrayidx5, align 4
	%call6 = tail call float @sqrtf(float %3) #2
	%vecinit.i = insertelement <4 x float> undef, float %call, i32 0
	%vecinit1.i = insertelement <4 x float> %vecinit.i, float %call2, i32 1
	%vecinit2.i = insertelement <4 x float> %vecinit1.i, float %call4, i32 2
	%vecinit3.i = insertelement <4 x float> %vecinit2.i, float %call6, i32 3
	ret <4 x float> %vecinit3.i
	}			}

	; Function Attrs: nounwind readnone			declare <2 x double> @llvm.sqrt.v2f64(<2 x double>) #1
	declare float @sqrtf(float) local_unnamed_addr #1			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #1

				attributes #0 = { nounwind readonly uwtable "target-features"="+avx2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }

	attributes #0 = { nounwind readonly uwtable "target-features"="+avx" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { nounwind readnone "target-features"="+avx2" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #2 = { nounwind readnone }