This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
ClosedPublic

Authored by RKSimon on Jul 7 2016, 11:50 AM.

Download Raw Diff

Details

Reviewers

spatel
eli.friedman
andreadb
mkuper
craig.topper

Commits

rG59c12c57992b: Merging r276102: --------------------------------------------------------------…
rGe3b9ee0645a6: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic…
rC276102: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic…
rL276102: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using…

Summary

D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion llvm patch will be submitted shortly.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 63108.Jul 7 2016, 11:50 AM

RKSimon retitled this revision from to [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR.

RKSimon updated this object.

RKSimon added reviewers: eli.friedman, mkuper, craig.topper, spatel, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: cfe-commits.

RKSimon mentioned this in D22106: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR.Jul 7 2016, 11:53 AM

RKSimon mentioned this in rL275981: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using….Jul 19 2016, 8:15 AM

I don't think we need to use x86-specific operations for sitofp-like conversions; the C cast is equivalent given that a 32 or 64-bit integer is always in within the range of a 32-bit float.

In D22105#488513, @eli.friedman wrote:

I don't think we need to use x86-specific operations for sitofp-like conversions; the C cast is equivalent given that a 32 or 64-bit integer is always in within the range of a 32-bit float.

I think the only situation that lossless conversion occurs is i32->f64, every other sitofp conversion could be affected by the rounding control no?

The x86-specific operation is affected by the rounding mode... but so is a C cast. This is specified by Annex F in the C standard.

Of course, you're going to end up with undefined behavior if you actually modify the rounding mode because LLVM and clang don't support FENV_ACCESS at the moment.

In D22105#488566, @eli.friedman wrote:

The x86-specific operation is affected by the rounding mode... but so is a C cast. This is specified by Annex F in the C standard.

Of course, you're going to end up with undefined behavior if you actually modify the rounding mode because LLVM and clang don't support FENV_ACCESS at the moment.

OK I'm going to pull the sitofp conversions from this patch - I have other concerns about them (i.e. we don't treat scalar + vector the same) that will need to be looked at as well.

Removed sitofp conversion changes

LGTM.

This revision is now accepted and ready to land.Jul 19 2016, 12:13 PM

Closed by commit rL276102: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using… (authored by RKSimon). · Explain WhyJul 20 2016, 3:25 AM

This revision was automatically updated to reflect the committed changes.

hans mentioned this in rL276990: Merging r275981 and r276740:.Jul 28 2016, 8:46 AM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Basic/

BuiltinsX86.def

8 lines

lib/

Headers/

avxintrin.h

4 lines

emmintrin.h

9 lines

xmmintrin.h

4 lines

test/

CodeGen/

4 lines

8 lines

9 lines

10 lines

Diff 64653

cfe/trunk/include/clang/Basic/BuiltinsX86.def

	Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_psignd128, "V4iV4iV4i", "", "ssse3")			TARGET_BUILTIN(__builtin_ia32_psignd128, "V4iV4iV4i", "", "ssse3")
	TARGET_BUILTIN(__builtin_ia32_pabsb128, "V16cV16c", "", "ssse3")			TARGET_BUILTIN(__builtin_ia32_pabsb128, "V16cV16c", "", "ssse3")
	TARGET_BUILTIN(__builtin_ia32_pabsw128, "V8sV8s", "", "ssse3")			TARGET_BUILTIN(__builtin_ia32_pabsw128, "V8sV8s", "", "ssse3")
	TARGET_BUILTIN(__builtin_ia32_pabsd128, "V4iV4i", "", "ssse3")			TARGET_BUILTIN(__builtin_ia32_pabsd128, "V4iV4i", "", "ssse3")

	TARGET_BUILTIN(__builtin_ia32_ldmxcsr, "vUi", "", "sse")			TARGET_BUILTIN(__builtin_ia32_ldmxcsr, "vUi", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_stmxcsr, "Ui", "", "sse")			TARGET_BUILTIN(__builtin_ia32_stmxcsr, "Ui", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_cvtss2si, "iV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_cvtss2si, "iV4f", "", "sse")
				TARGET_BUILTIN(__builtin_ia32_cvttss2si, "iV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_cvtss2si64, "LLiV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_cvtss2si64, "LLiV4f", "", "sse")
				TARGET_BUILTIN(__builtin_ia32_cvttss2si64, "LLiV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_storehps, "vV2i*V4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_storehps, "vV2i*V4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_storelps, "vV2i*V4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_storelps, "vV2i*V4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_movmskps, "iV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_movmskps, "iV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_sfence, "v", "", "sse")			TARGET_BUILTIN(__builtin_ia32_sfence, "v", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_rcpps, "V4fV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_rcpps, "V4fV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_rcpss, "V4fV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_rcpss, "V4fV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_rsqrtps, "V4fV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_rsqrtps, "V4fV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_rsqrtss, "V4fV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_rsqrtss, "V4fV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_sqrtps, "V4fV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_sqrtps, "V4fV4f", "", "sse")
	TARGET_BUILTIN(__builtin_ia32_sqrtss, "V4fV4f", "", "sse")			TARGET_BUILTIN(__builtin_ia32_sqrtss, "V4fV4f", "", "sse")

	TARGET_BUILTIN(__builtin_ia32_maskmovdqu, "vV16cV16cc*", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_maskmovdqu, "vV16cV16cc*", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movmskpd, "iV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movmskpd, "iV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_pmovmskb128, "iV16c", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_pmovmskb128, "iV16c", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movnti, "vi*i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movnti, "vi*i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movnti64, "vLLi*LLi", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movnti64, "vLLi*LLi", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_psadbw128, "V2LLiV16cV16c", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_psadbw128, "V2LLiV16cV16c", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_sqrtpd, "V2dV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_sqrtpd, "V2dV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_sqrtsd, "V2dV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_sqrtsd, "V2dV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtdq2ps, "V4fV4i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtdq2ps, "V4fV4i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2dq, "V2LLiV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtpd2dq, "V2LLiV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2ps, "V4fV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtpd2ps, "V4fV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvttpd2dq, "V4iV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvttpd2dq, "V4iV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtsd2si, "iV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtsd2si, "iV2d", "", "sse2")
				TARGET_BUILTIN(__builtin_ia32_cvttsd2si, "iV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtsd2si64, "LLiV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtsd2si64, "LLiV2d", "", "sse2")
				TARGET_BUILTIN(__builtin_ia32_cvttsd2si64, "LLiV2d", "", "sse2")
				TARGET_BUILTIN(__builtin_ia32_cvtsd2ss, "V4fV4fV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtps2dq, "V4iV4f", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtps2dq, "V4iV4f", "", "sse2")
				TARGET_BUILTIN(__builtin_ia32_cvttps2dq, "V4iV4f", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_clflush, "vvC*", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_clflush, "vvC*", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_lfence, "v", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_lfence, "v", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_mfence, "v", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_mfence, "v", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_pause, "v", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_pause, "v", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_pmuludq128, "V2LLiV4iV4i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_pmuludq128, "V2LLiV4iV4i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_psraw128, "V8sV8sV8s", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_psraw128, "V8sV8sV8s", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_psrad128, "V4iV4iV4i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_psrad128, "V4iV4iV4i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_psrlw128, "V8sV8sV8s", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_psrlw128, "V8sV8sV8s", "", "sse2")
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_cmppd256, "V4dV4dV4dIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmppd256, "V4dV4dV4dIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cmpps, "V4fV4fV4fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmpps, "V4fV4fV4fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cmpps256, "V8fV8fV8fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmpps256, "V8fV8fV8fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cmpsd, "V2dV2dV2dIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmpsd, "V2dV2dV2dIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cmpss, "V4fV4fV4fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmpss, "V4fV4fV4fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtdq2ps256, "V8fV8i", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtdq2ps256, "V8fV8i", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2ps256, "V4fV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtpd2ps256, "V4fV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtps2dq256, "V8iV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtps2dq256, "V8iV8f", "", "avx")
				TARGET_BUILTIN(__builtin_ia32_cvttpd2dq256, "V4iV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2dq256, "V4iV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtpd2dq256, "V4iV4d", "", "avx")
				TARGET_BUILTIN(__builtin_ia32_cvttps2dq256, "V8iV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_pd256, "V4dV4dV4dIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_pd256, "V4dV4dV4dIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_sqrtpd256, "V4dV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_sqrtpd256, "V4dV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_sqrtps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_sqrtps256, "V8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_rsqrtps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_rsqrtps256, "V8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_rcpps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_rcpps256, "V8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_roundpd256, "V4dV4dIi", "", "avx")			TARGET_BUILTIN(__builtin_ia32_roundpd256, "V4dV4dIi", "", "avx")
	▲ Show 20 Lines • Show All 1,663 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/avxintrin.h

	Show First 20 Lines • Show All 2,111 Lines • ▼ Show 20 Lines
	_mm256_cvtps_pd(__m128 __a)			_mm256_cvtps_pd(__m128 __a)
	{			{
	return (__m256d)__builtin_convertvector((__v4sf)__a, __v4df);			return (__m256d)__builtin_convertvector((__v4sf)__a, __v4df);
	}			}

	static __inline __m128i __DEFAULT_FN_ATTRS			static __inline __m128i __DEFAULT_FN_ATTRS
	_mm256_cvttpd_epi32(__m256d __a)			_mm256_cvttpd_epi32(__m256d __a)
	{			{
	return (__m128i)__builtin_convertvector((__v4df) __a, __v4si);			return (__m128i)__builtin_ia32_cvttpd2dq256((__v4df) __a);
	}			}

	static __inline __m128i __DEFAULT_FN_ATTRS			static __inline __m128i __DEFAULT_FN_ATTRS
	_mm256_cvtpd_epi32(__m256d __a)			_mm256_cvtpd_epi32(__m256d __a)
	{			{
	return (__m128i)__builtin_ia32_cvtpd2dq256((__v4df) __a);			return (__m128i)__builtin_ia32_cvtpd2dq256((__v4df) __a);
	}			}

	static __inline __m256i __DEFAULT_FN_ATTRS			static __inline __m256i __DEFAULT_FN_ATTRS
	_mm256_cvttps_epi32(__m256 __a)			_mm256_cvttps_epi32(__m256 __a)
	{			{
	return (__m256i)__builtin_convertvector((__v8sf) __a, __v8si);			return (__m256i)__builtin_ia32_cvttps2dq256((__v8sf) __a);
	}			}

	static __inline double __DEFAULT_FN_ATTRS			static __inline double __DEFAULT_FN_ATTRS
	_mm256_cvtsd_f64(__m256d __a)			_mm256_cvtsd_f64(__m256d __a)
	{			{
	return __a[0];			return __a[0];
	}			}

	▲ Show 20 Lines • Show All 788 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/emmintrin.h

	Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines
	_mm_cvtsd_si32(__m128d __a)			_mm_cvtsd_si32(__m128d __a)
	{			{
	return __builtin_ia32_cvtsd2si((__v2df)__a);			return __builtin_ia32_cvtsd2si((__v2df)__a);
	}			}

	static __inline__ __m128 __DEFAULT_FN_ATTRS			static __inline__ __m128 __DEFAULT_FN_ATTRS
	_mm_cvtsd_ss(__m128 __a, __m128d __b)			_mm_cvtsd_ss(__m128 __a, __m128d __b)
	{			{
	__a[0] = __b[0];			return (__m128)__builtin_ia32_cvtsd2ss((__v4sf)__a, (__v2df)__b);
	return __a;
	}			}

	static __inline__ __m128d __DEFAULT_FN_ATTRS			static __inline__ __m128d __DEFAULT_FN_ATTRS
	_mm_cvtsi32_sd(__m128d __a, int __b)			_mm_cvtsi32_sd(__m128d __a, int __b)
	{			{
	__a[0] = __b;			__a[0] = __b;
	return __a;			return __a;
	}			}
	Show All 9 Lines
	_mm_cvttpd_epi32(__m128d __a)			_mm_cvttpd_epi32(__m128d __a)
	{			{
	return (__m128i)__builtin_ia32_cvttpd2dq((__v2df)__a);			return (__m128i)__builtin_ia32_cvttpd2dq((__v2df)__a);
	}			}

	static __inline__ int __DEFAULT_FN_ATTRS			static __inline__ int __DEFAULT_FN_ATTRS
	_mm_cvttsd_si32(__m128d __a)			_mm_cvttsd_si32(__m128d __a)
	{			{
	return __a[0];			return __builtin_ia32_cvttsd2si((__v2df)__a);
	}			}

	static __inline__ __m64 __DEFAULT_FN_ATTRS			static __inline__ __m64 __DEFAULT_FN_ATTRS
	_mm_cvtpd_pi32(__m128d __a)			_mm_cvtpd_pi32(__m128d __a)
	{			{
	return (__m64)__builtin_ia32_cvtpd2pi((__v2df)__a);			return (__m64)__builtin_ia32_cvtpd2pi((__v2df)__a);
	}			}

	▲ Show 20 Lines • Show All 1,246 Lines • ▼ Show 20 Lines
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [2 x double]. The lower 64 bits are used in the			/// A 128-bit vector of [2 x double]. The lower 64 bits are used in the
	/// conversion.			/// conversion.
	/// \returns A 64-bit signed integer containing the converted value.			/// \returns A 64-bit signed integer containing the converted value.
	static __inline__ long long __DEFAULT_FN_ATTRS			static __inline__ long long __DEFAULT_FN_ATTRS
	_mm_cvttsd_si64(__m128d __a)			_mm_cvttsd_si64(__m128d __a)
	{			{
	return __a[0];			return __builtin_ia32_cvttsd2si64((__v2df)__a);
	}			}
	#endif			#endif

	/// \brief Converts a vector of [4 x i32] into a vector of [4 x float].			/// \brief Converts a vector of [4 x i32] into a vector of [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the \c VCVTDQ2PS / CVTDQ2PS instruction.			/// This intrinsic corresponds to the \c VCVTDQ2PS / CVTDQ2PS instruction.
	Show All 31 Lines
	/// This intrinsic corresponds to the \c VCVTTPS2DQ / CVTTPS2DQ instruction.			/// This intrinsic corresponds to the \c VCVTTPS2DQ / CVTTPS2DQ instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [4 x float].			/// A 128-bit vector of [4 x float].
	/// \returns A 128-bit vector of [4 x i32] containing the converted values.			/// \returns A 128-bit vector of [4 x i32] containing the converted values.
	static __inline__ __m128i __DEFAULT_FN_ATTRS			static __inline__ __m128i __DEFAULT_FN_ATTRS
	_mm_cvttps_epi32(__m128 __a)			_mm_cvttps_epi32(__m128 __a)
	{			{
	return (__m128i)__builtin_convertvector((__v4sf)__a, __v4si);			return (__m128i)__builtin_ia32_cvttps2dq((__v4sf)__a);
	}			}

	/// \brief Returns a vector of [4 x i32] where the lowest element is the input			/// \brief Returns a vector of [4 x i32] where the lowest element is the input
	/// operand and the remaining elements are zero.			/// operand and the remaining elements are zero.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the \c VMOVD / MOVD instruction.			/// This intrinsic corresponds to the \c VMOVD / MOVD instruction.
	▲ Show 20 Lines • Show All 692 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/xmmintrin.h

	Show First 20 Lines • Show All 1,344 Lines • ▼ Show 20 Lines
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [4 x float]. The lower 32 bits of this operand are			/// A 128-bit vector of [4 x float]. The lower 32 bits of this operand are
	/// used in the conversion.			/// used in the conversion.
	/// \returns A 32-bit integer containing the converted value.			/// \returns A 32-bit integer containing the converted value.
	static __inline__ int __DEFAULT_FN_ATTRS			static __inline__ int __DEFAULT_FN_ATTRS
	_mm_cvttss_si32(__m128 __a)			_mm_cvttss_si32(__m128 __a)
	{			{
	return __a[0];			return __builtin_ia32_cvttss2si((__v4sf)__a);
	}			}

	/// \brief Converts a float value contained in the lower 32 bits of a vector of			/// \brief Converts a float value contained in the lower 32 bits of a vector of
	/// [4 x float] into a 32-bit integer, truncating the result when it is			/// [4 x float] into a 32-bit integer, truncating the result when it is
	/// inexact.			/// inexact.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	Show All 19 Lines
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [4 x float]. The lower 32 bits of this operand are			/// A 128-bit vector of [4 x float]. The lower 32 bits of this operand are
	/// used in the conversion.			/// used in the conversion.
	/// \returns A 64-bit integer containing the converted value.			/// \returns A 64-bit integer containing the converted value.
	static __inline__ long long __DEFAULT_FN_ATTRS			static __inline__ long long __DEFAULT_FN_ATTRS
	_mm_cvttss_si64(__m128 __a)			_mm_cvttss_si64(__m128 __a)
	{			{
	return __a[0];			return __builtin_ia32_cvttss2si64((__v4sf)__a);
	}			}

	/// \brief Converts two low-order float values in a 128-bit vector of			/// \brief Converts two low-order float values in a 128-bit vector of
	/// [4 x float] into a 64-bit vector of [2 x i32], truncating the result			/// [4 x float] into a 64-bit vector of [2 x i32], truncating the result
	/// when it is inexact.			/// when it is inexact.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	▲ Show 20 Lines • Show All 1,496 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/avx-builtins.c

	Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
	__m256d test_mm256_cvtps_pd(__m128 A) {			__m256d test_mm256_cvtps_pd(__m128 A) {
	// CHECK-LABEL: test_mm256_cvtps_pd			// CHECK-LABEL: test_mm256_cvtps_pd
	// CHECK: fpext <4 x float> %{{.*}} to <4 x double>			// CHECK: fpext <4 x float> %{{.*}} to <4 x double>
	return _mm256_cvtps_pd(A);			return _mm256_cvtps_pd(A);
	}			}

	__m128i test_mm256_cvttpd_epi32(__m256d A) {			__m128i test_mm256_cvttpd_epi32(__m256d A) {
	// CHECK-LABEL: test_mm256_cvttpd_epi32			// CHECK-LABEL: test_mm256_cvttpd_epi32
	// CHECK: fptosi <4 x double> %{{.*}} to <4 x i32>			// CHECK: call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %{{.*}})
	return _mm256_cvttpd_epi32(A);			return _mm256_cvttpd_epi32(A);
	}			}

	__m256i test_mm256_cvttps_epi32(__m256 A) {			__m256i test_mm256_cvttps_epi32(__m256 A) {
	// CHECK-LABEL: test_mm256_cvttps_epi32			// CHECK-LABEL: test_mm256_cvttps_epi32
	// CHECK: fptosi <8 x float> %{{.*}} to <8 x i32>			// CHECK: call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %{{.*}})
	return _mm256_cvttps_epi32(A);			return _mm256_cvttps_epi32(A);
	}			}

	__m256d test_mm256_div_pd(__m256d A, __m256d B) {			__m256d test_mm256_div_pd(__m256d A, __m256d B) {
	// CHECK-LABEL: test_mm256_div_pd			// CHECK-LABEL: test_mm256_div_pd
	// CHECK: fdiv <4 x double>			// CHECK: fdiv <4 x double>
	return _mm256_div_pd(A, B);			return _mm256_div_pd(A, B);
	}			}
	▲ Show 20 Lines • Show All 1,105 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/builtins-x86.c

Show First 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	#endif
(void)__builtin_ia32_xsaves64(tmp_vp, tmp_ULLi);		(void)__builtin_ia32_xsaves64(tmp_vp, tmp_ULLi);

(void) __builtin_ia32_monitorx(tmp_vp, tmp_Ui, tmp_Ui);		(void) __builtin_ia32_monitorx(tmp_vp, tmp_Ui, tmp_Ui);
(void) __builtin_ia32_mwaitx(tmp_Ui, tmp_Ui, tmp_Ui);		(void) __builtin_ia32_mwaitx(tmp_Ui, tmp_Ui, tmp_Ui);

tmp_V4f = __builtin_ia32_cvtpi2ps(tmp_V4f, tmp_V2i);		tmp_V4f = __builtin_ia32_cvtpi2ps(tmp_V4f, tmp_V2i);
tmp_V2i = __builtin_ia32_cvtps2pi(tmp_V4f);		tmp_V2i = __builtin_ia32_cvtps2pi(tmp_V4f);
tmp_i = __builtin_ia32_cvtss2si(tmp_V4f);		tmp_i = __builtin_ia32_cvtss2si(tmp_V4f);
		tmp_i = __builtin_ia32_cvttss2si(tmp_V4f);

tmp_i = __builtin_ia32_rdtsc();		tmp_i = __builtin_ia32_rdtsc();
tmp_i = __builtin_ia32_rdtscp(&tmp_Ui);		tmp_i = __builtin_ia32_rdtscp(&tmp_Ui);
tmp_LLi = __builtin_ia32_rdpmc(tmp_i);		tmp_LLi = __builtin_ia32_rdpmc(tmp_i);
#ifdef USE_64		#ifdef USE_64
tmp_LLi = __builtin_ia32_cvtss2si64(tmp_V4f);		tmp_LLi = __builtin_ia32_cvtss2si64(tmp_V4f);
		tmp_LLi = __builtin_ia32_cvttss2si64(tmp_V4f);
#endif		#endif
tmp_V2i = __builtin_ia32_cvttps2pi(tmp_V4f);		tmp_V2i = __builtin_ia32_cvttps2pi(tmp_V4f);
(void) __builtin_ia32_maskmovq(tmp_V8c, tmp_V8c, tmp_cp);		(void) __builtin_ia32_maskmovq(tmp_V8c, tmp_V8c, tmp_cp);
(void) __builtin_ia32_storehps(tmp_V2ip, tmp_V4f);		(void) __builtin_ia32_storehps(tmp_V2ip, tmp_V4f);
(void) __builtin_ia32_storelps(tmp_V2ip, tmp_V4f);		(void) __builtin_ia32_storelps(tmp_V2ip, tmp_V4f);
tmp_i = __builtin_ia32_movmskps(tmp_V4f);		tmp_i = __builtin_ia32_movmskps(tmp_V4f);
tmp_i = __builtin_ia32_pmovmskb(tmp_V8c);		tmp_i = __builtin_ia32_pmovmskb(tmp_V8c);
(void) __builtin_ia32_movntq(tmp_V1LLip, tmp_V1LLi);		(void) __builtin_ia32_movntq(tmp_V1LLip, tmp_V1LLi);
Show All 19 Lines	#endif
tmp_V4f = __builtin_ia32_cvtdq2ps(tmp_V4i);		tmp_V4f = __builtin_ia32_cvtdq2ps(tmp_V4i);
tmp_V2LLi = __builtin_ia32_cvtpd2dq(tmp_V2d);		tmp_V2LLi = __builtin_ia32_cvtpd2dq(tmp_V2d);
tmp_V2i = __builtin_ia32_cvtpd2pi(tmp_V2d);		tmp_V2i = __builtin_ia32_cvtpd2pi(tmp_V2d);
tmp_V4f = __builtin_ia32_cvtpd2ps(tmp_V2d);		tmp_V4f = __builtin_ia32_cvtpd2ps(tmp_V2d);
tmp_V4i = __builtin_ia32_cvttpd2dq(tmp_V2d);		tmp_V4i = __builtin_ia32_cvttpd2dq(tmp_V2d);
tmp_V2i = __builtin_ia32_cvttpd2pi(tmp_V2d);		tmp_V2i = __builtin_ia32_cvttpd2pi(tmp_V2d);
tmp_V2d = __builtin_ia32_cvtpi2pd(tmp_V2i);		tmp_V2d = __builtin_ia32_cvtpi2pd(tmp_V2i);
tmp_i = __builtin_ia32_cvtsd2si(tmp_V2d);		tmp_i = __builtin_ia32_cvtsd2si(tmp_V2d);
		tmp_i = __builtin_ia32_cvttsd2si(tmp_V2d);
		tmp_V4f = __builtin_ia32_cvtsd2ss(tmp_V4f, tmp_V2d);
#ifdef USE_64		#ifdef USE_64
tmp_LLi = __builtin_ia32_cvtsd2si64(tmp_V2d);		tmp_LLi = __builtin_ia32_cvtsd2si64(tmp_V2d);
		tmp_LLi = __builtin_ia32_cvttsd2si64(tmp_V2d);
#endif		#endif
tmp_V4i = __builtin_ia32_cvtps2dq(tmp_V4f);		tmp_V4i = __builtin_ia32_cvtps2dq(tmp_V4f);
		tmp_V4i = __builtin_ia32_cvttps2dq(tmp_V4f);
(void) __builtin_ia32_clflush(tmp_vCp);		(void) __builtin_ia32_clflush(tmp_vCp);
(void) __builtin_ia32_lfence();		(void) __builtin_ia32_lfence();
(void) __builtin_ia32_mfence();		(void) __builtin_ia32_mfence();
tmp_V4s = __builtin_ia32_psllwi(tmp_V4s, tmp_i);		tmp_V4s = __builtin_ia32_psllwi(tmp_V4s, tmp_i);
tmp_V2i = __builtin_ia32_pslldi(tmp_V2i, tmp_i);		tmp_V2i = __builtin_ia32_pslldi(tmp_V2i, tmp_i);
tmp_V1LLi = __builtin_ia32_psllqi(tmp_V1LLi, tmp_i);		tmp_V1LLi = __builtin_ia32_psllqi(tmp_V1LLi, tmp_i);
tmp_V4s = __builtin_ia32_psrawi(tmp_V4s, tmp_i);		tmp_V4s = __builtin_ia32_psrawi(tmp_V4s, tmp_i);
tmp_V2i = __builtin_ia32_psradi(tmp_V2i, tmp_i);		tmp_V2i = __builtin_ia32_psradi(tmp_V2i, tmp_i);
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	#endif
tmp_V4d = __builtin_ia32_blendvpd256(tmp_V4d, tmp_V4d, tmp_V4d);		tmp_V4d = __builtin_ia32_blendvpd256(tmp_V4d, tmp_V4d, tmp_V4d);
tmp_V8f = __builtin_ia32_blendvps256(tmp_V8f, tmp_V8f, tmp_V8f);		tmp_V8f = __builtin_ia32_blendvps256(tmp_V8f, tmp_V8f, tmp_V8f);
tmp_V8f = __builtin_ia32_dpps256(tmp_V8f, tmp_V8f, 0x7);		tmp_V8f = __builtin_ia32_dpps256(tmp_V8f, tmp_V8f, 0x7);
tmp_V4d = __builtin_ia32_cmppd256(tmp_V4d, tmp_V4d, 0);		tmp_V4d = __builtin_ia32_cmppd256(tmp_V4d, tmp_V4d, 0);
tmp_V8f = __builtin_ia32_cmpps256(tmp_V8f, tmp_V8f, 0);		tmp_V8f = __builtin_ia32_cmpps256(tmp_V8f, tmp_V8f, 0);
tmp_V8f = __builtin_ia32_cvtdq2ps256(tmp_V8i);		tmp_V8f = __builtin_ia32_cvtdq2ps256(tmp_V8i);
tmp_V4f = __builtin_ia32_cvtpd2ps256(tmp_V4d);		tmp_V4f = __builtin_ia32_cvtpd2ps256(tmp_V4d);
tmp_V8i = __builtin_ia32_cvtps2dq256(tmp_V8f);		tmp_V8i = __builtin_ia32_cvtps2dq256(tmp_V8f);
		tmp_V4i = __builtin_ia32_cvttpd2dq256(tmp_V4d);
tmp_V4i = __builtin_ia32_cvtpd2dq256(tmp_V4d);		tmp_V4i = __builtin_ia32_cvtpd2dq256(tmp_V4d);
		tmp_V8i = __builtin_ia32_cvttps2dq256(tmp_V8f);
tmp_V4d = __builtin_ia32_vperm2f128_pd256(tmp_V4d, tmp_V4d, 0x7);		tmp_V4d = __builtin_ia32_vperm2f128_pd256(tmp_V4d, tmp_V4d, 0x7);
tmp_V8f = __builtin_ia32_vperm2f128_ps256(tmp_V8f, tmp_V8f, 0x7);		tmp_V8f = __builtin_ia32_vperm2f128_ps256(tmp_V8f, tmp_V8f, 0x7);
tmp_V8i = __builtin_ia32_vperm2f128_si256(tmp_V8i, tmp_V8i, 0x7);		tmp_V8i = __builtin_ia32_vperm2f128_si256(tmp_V8i, tmp_V8i, 0x7);
tmp_V4d = __builtin_ia32_sqrtpd256(tmp_V4d);		tmp_V4d = __builtin_ia32_sqrtpd256(tmp_V4d);
tmp_V8f = __builtin_ia32_sqrtps256(tmp_V8f);		tmp_V8f = __builtin_ia32_sqrtps256(tmp_V8f);
tmp_V8f = __builtin_ia32_rsqrtps256(tmp_V8f);		tmp_V8f = __builtin_ia32_rsqrtps256(tmp_V8f);
tmp_V8f = __builtin_ia32_rcpps256(tmp_V8f);		tmp_V8f = __builtin_ia32_rcpps256(tmp_V8f);
tmp_V4d = __builtin_ia32_roundpd256(tmp_V4d, 0x1);		tmp_V4d = __builtin_ia32_roundpd256(tmp_V4d, 0x1);
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/sse-builtins.c

	Show First 20 Lines • Show All 289 Lines • ▼ Show 20 Lines
	long long test_mm_cvtss_si64(__m128 A) {			long long test_mm_cvtss_si64(__m128 A) {
	// CHECK-LABEL: test_mm_cvtss_si64			// CHECK-LABEL: test_mm_cvtss_si64
	// CHECK: call i64 @llvm.x86.sse.cvtss2si64(<4 x float> %{{.*}})			// CHECK: call i64 @llvm.x86.sse.cvtss2si64(<4 x float> %{{.*}})
	return _mm_cvtss_si64(A);			return _mm_cvtss_si64(A);
	}			}

	int test_mm_cvtt_ss2si(__m128 A) {			int test_mm_cvtt_ss2si(__m128 A) {
	// CHECK-LABEL: test_mm_cvtt_ss2si			// CHECK-LABEL: test_mm_cvtt_ss2si
	// CHECK: extractelement <4 x float> %{{.*}}, i32 0			// CHECK: call i32 @llvm.x86.sse.cvttss2si(<4 x float> %{{.*}})
	// CHECK: fptosi float %{{.*}} to i32
	return _mm_cvtt_ss2si(A);			return _mm_cvtt_ss2si(A);
	}			}

	int test_mm_cvttss_si32(__m128 A) {			int test_mm_cvttss_si32(__m128 A) {
	// CHECK-LABEL: test_mm_cvttss_si32			// CHECK-LABEL: test_mm_cvttss_si32
	// CHECK: extractelement <4 x float> %{{.*}}, i32 0			// CHECK: call i32 @llvm.x86.sse.cvttss2si(<4 x float> %{{.*}})
	// CHECK: fptosi float %{{.*}} to i32
	return _mm_cvttss_si32(A);			return _mm_cvttss_si32(A);
	}			}

	long long test_mm_cvttss_si64(__m128 A) {			long long test_mm_cvttss_si64(__m128 A) {
	// CHECK-LABEL: test_mm_cvttss_si64			// CHECK-LABEL: test_mm_cvttss_si64
	// CHECK: extractelement <4 x float> %{{.*}}, i32 0			// CHECK: call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %{{.*}})
	// CHECK: fptosi float %{{.*}} to i64
	return _mm_cvttss_si64(A);			return _mm_cvttss_si64(A);
	}			}

	__m128 test_mm_div_ps(__m128 A, __m128 B) {			__m128 test_mm_div_ps(__m128 A, __m128 B) {
	// CHECK-LABEL: test_mm_div_ps			// CHECK-LABEL: test_mm_div_ps
	// CHECK: fdiv <4 x float>			// CHECK: fdiv <4 x float>
	return _mm_div_ps(A, B);			return _mm_div_ps(A, B);
	}			}
	▲ Show 20 Lines • Show All 509 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/sse2-builtins.c

	Show First 20 Lines • Show All 501 Lines • ▼ Show 20 Lines
	long long test_mm_cvtsd_si64(__m128d A) {			long long test_mm_cvtsd_si64(__m128d A) {
	// CHECK-LABEL: test_mm_cvtsd_si64			// CHECK-LABEL: test_mm_cvtsd_si64
	// CHECK: call i64 @llvm.x86.sse2.cvtsd2si64(<2 x double> %{{.*}})			// CHECK: call i64 @llvm.x86.sse2.cvtsd2si64(<2 x double> %{{.*}})
	return _mm_cvtsd_si64(A);			return _mm_cvtsd_si64(A);
	}			}

	__m128 test_mm_cvtsd_ss(__m128 A, __m128d B) {			__m128 test_mm_cvtsd_ss(__m128 A, __m128d B) {
	// CHECK-LABEL: test_mm_cvtsd_ss			// CHECK-LABEL: test_mm_cvtsd_ss
	// CHECK: fptrunc double %{{.*}} to float			// CHECK: call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %{{.}}, <2 x double> %{{.}})
	return _mm_cvtsd_ss(A, B);			return _mm_cvtsd_ss(A, B);
	}			}

	int test_mm_cvtsi128_si32(__m128i A) {			int test_mm_cvtsi128_si32(__m128i A) {
	// CHECK-LABEL: test_mm_cvtsi128_si32			// CHECK-LABEL: test_mm_cvtsi128_si32
	// CHECK: extractelement <4 x i32> %{{.*}}, i32 0			// CHECK: extractelement <4 x i32> %{{.*}}, i32 0
	return _mm_cvtsi128_si32(A);			return _mm_cvtsi128_si32(A);
	}			}
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	__m128i test_mm_cvttpd_epi32(__m128d A) {			__m128i test_mm_cvttpd_epi32(__m128d A) {
	// CHECK-LABEL: test_mm_cvttpd_epi32			// CHECK-LABEL: test_mm_cvttpd_epi32
	// CHECK: call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %{{.*}})			// CHECK: call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %{{.*}})
	return _mm_cvttpd_epi32(A);			return _mm_cvttpd_epi32(A);
	}			}

	__m128i test_mm_cvttps_epi32(__m128 A) {			__m128i test_mm_cvttps_epi32(__m128 A) {
	// CHECK-LABEL: test_mm_cvttps_epi32			// CHECK-LABEL: test_mm_cvttps_epi32
	// CHECK: fptosi <4 x float> %{{.*}} to <4 x i32>			// CHECK: call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %{{.*}})
	return _mm_cvttps_epi32(A);			return _mm_cvttps_epi32(A);
	}			}

	int test_mm_cvttsd_si32(__m128d A) {			int test_mm_cvttsd_si32(__m128d A) {
	// CHECK-LABEL: test_mm_cvttsd_si32			// CHECK-LABEL: test_mm_cvttsd_si32
	// CHECK: extractelement <2 x double> %{{.*}}, i32 0			// CHECK: call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %{{.*}})
	// CHECK: fptosi double %{{.*}} to i32
	return _mm_cvttsd_si32(A);			return _mm_cvttsd_si32(A);
	}			}

	long long test_mm_cvttsd_si64(__m128d A) {			long long test_mm_cvttsd_si64(__m128d A) {
	// CHECK-LABEL: test_mm_cvttsd_si64			// CHECK-LABEL: test_mm_cvttsd_si64
	// CHECK: extractelement <2 x double> %{{.*}}, i32 0			// CHECK: call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %{{.*}})
	// CHECK: fptosi double %{{.*}} to i64
	return _mm_cvttsd_si64(A);			return _mm_cvttsd_si64(A);
	}			}

	__m128d test_mm_div_pd(__m128d A, __m128d B) {			__m128d test_mm_div_pd(__m128d A, __m128d B) {
	// CHECK-LABEL: test_mm_div_pd			// CHECK-LABEL: test_mm_div_pd
	// CHECK: fdiv <2 x double>			// CHECK: fdiv <2 x double>
	return _mm_div_pd(A, B);			return _mm_div_pd(A, B);
	}			}
	▲ Show 20 Lines • Show All 948 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 64653

cfe/trunk/include/clang/Basic/BuiltinsX86.def

cfe/trunk/lib/Headers/avxintrin.h

cfe/trunk/lib/Headers/emmintrin.h

cfe/trunk/lib/Headers/xmmintrin.h

cfe/trunk/test/CodeGen/avx-builtins.c

cfe/trunk/test/CodeGen/builtins-x86.c

cfe/trunk/test/CodeGen/sse-builtins.c

cfe/trunk/test/CodeGen/sse2-builtins.c

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
ClosedPublic