This is an archive of the discontinued LLVM Phabricator instance.

Differential D20528

[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
ClosedPublic

Authored by RKSimon on May 23 2016, 9:27 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
andreadb
mkuper
craig.topper

Commits

rG90770c7c7657: [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
rC270499: [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
rL270499: [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR

Summary

Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics.

This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 58106.May 23 2016, 9:27 AM

RKSimon retitled this revision from to [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR.

RKSimon updated this object.

RKSimon added reviewers: qcolombet, craig.topper, mkuper, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: cfe-commits.

Thanks, Simon!
This looks right, but we may lose some end-to-end tests, since right now we have a clang-level test that checks the builtin is lowered to the intrinsic, and (hopefully) a CG-level test that the intrinsic is lowered to the correct instruction.
Do you know if there are already CG tests that check we correctly lower these IR patterns to CVTPS2PD, etc? If not, could you add them?

lib/Headers/emmintrin.h
390 ↗	(On Diff #58106)	It looks like there's a missing paren after the first __v4sf. How does the test compile? Or am I misreading?

In D20528#436893, @mkuper wrote:

This looks right, but we may lose some end-to-end tests, since right now we have a clang-level test that checks the builtin is lowered to the intrinsic, and (hopefully) a CG-level test that the intrinsic is lowered to the correct instruction.
Do you know if there are already CG tests that check we correctly lower these IR patterns to CVTPS2PD, etc? If not, could you add them?

I do have the relevant changes for llvm\test\CodeGen\X86\sse2-intrinsics-fast-isel.ll and llvm\test\CodeGen\X86\avx-intrinsics-fast-isel.ll (I spent most of last week adding them all.....). Do you want me to setup a separate llvm patch for review? I'm not ready to do the rest of the llvm work (removal of the llvm intrinsics / auto-upgrade etc.). but the fast-isel changes are very simple.

lib/Headers/emmintrin.h
390 ↗	(On Diff #58106)	Sorry, that's me 'fixing' clang-format which I stupidly forgot to run until just before submission.

In D20528#437090, @RKSimon wrote:

In D20528#436893, @mkuper wrote:

This looks right, but we may lose some end-to-end tests, since right now we have a clang-level test that checks the builtin is lowered to the intrinsic, and (hopefully) a CG-level test that the intrinsic is lowered to the correct instruction.
Do you know if there are already CG tests that check we correctly lower these IR patterns to CVTPS2PD, etc? If not, could you add them?

I do have the relevant changes for llvm\test\CodeGen\X86\sse2-intrinsics-fast-isel.ll and llvm\test\CodeGen\X86\avx-intrinsics-fast-isel.ll (I spent most of last week adding them all.....). Do you want me to setup a separate llvm patch for review? I'm not ready to do the rest of the llvm work (removal of the llvm intrinsics / auto-upgrade etc.). but the fast-isel changes are very simple.

Sorry, I didn't intend to imply the rest of the llvm work is necessary for this to go in. Just that I'd be happier with this patch knowing that we have a regression test for doing the (shuffle + fpext, say) lowering correctly. I didn't even mean fast-isel, only the DAG.

RKSimon updated this revision to Diff 58146.May 23 2016, 1:27 PM

RKSimon edited edge metadata.

In D20528#437117, @mkuper wrote:

Sorry, I didn't intend to imply the rest of the llvm work is necessary for this to go in. Just that I'd be happier with this patch knowing that we have a regression test for doing the (shuffle + fpext, say) lowering correctly. I didn't even mean fast-isel, only the DAG.

The fast-isel tests are the most self contained (and are useful to show the non-optimized codegen for every intrinsic in the headers). I can submit them now if you wish.

Presumably, the fast-isel lowering of the IR pattern is already correct, and in any case, it isn't affected by this patch.
I just want to make sure we don't regress the optimized DAG codegen - that is, it still produces the instruction we'd expect from the intrinsic (or something at least as good).

RKSimon mentioned this in rL270494: [X86][SSE] Added cvtdq2pd/cvtps2pd generic IR tests.May 23 2016, 2:51 PM

In D20528#437165, @mkuper wrote:

Presumably, the fast-isel lowering of the IR pattern is already correct, and in any case, it isn't affected by this patch.
I just want to make sure we don't regress the optimized DAG codegen - that is, it still produces the instruction we'd expect from the intrinsic (or something at least as good).

The existing llvm\test\CodeGen\X86\vec_fpext.ll and llvm\test\CodeGen\X86\vec_int_to_fp.ll already demonstrate the correct optimized DAG codegen using the same IR as output in the clang\test\CodeGen\*-builtins.c here.

Also, the aim is to keep the llvm\test\CodeGen\X86\*-intrinsics-fast-isel.ll tests in sync with the llvm\tools\clang\test\CodeGen\*-builtins.c equivalents.

The existing llvm\test\CodeGen\X86\vec_fpext.ll and llvm\test\CodeGen\X86\vec_int_to_fp.ll already demonstrate the correct optimized DAG codegen using the same IR as output in the clang\test\CodeGen\*-builtins.c here.

That's what I meant by "Do you know if there are already CG tests that check we correctly lower these IR patterns", sorry I wasn't more clear.
This LGTM.

This revision is now accepted and ready to land.May 23 2016, 3:05 PM

Closed by commit rL270499: [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR (authored by RKSimon). · Explain WhyMay 23 2016, 3:19 PM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in rL270501: [X86][SSE] Updated (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) fast-isel codegen to….May 23 2016, 3:24 PM

RKSimon mentioned this in D20568: [X86][SSE] Replace (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) lossless conversion intrinsics with generic IR.May 24 2016, 6:41 AM

RKSimon mentioned this in rL270678: [X86][SSE] Replace (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) lossless conversion….May 25 2016, 2:06 AM

RKSimon mentioned this in rL286056: [AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they….Nov 6 2016, 6:17 AM

RKSimon mentioned this in D26686: [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with generic IR.Nov 15 2016, 11:13 AM

RKSimon mentioned this in rL287088: [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with….Nov 16 2016, 1:37 AM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Basic/

BuiltinsX86.def

4 lines

lib/

Headers/

avxintrin.h

4 lines

emmintrin.h

6 lines

test/

CodeGen/

avx-builtins.c

4 lines

builtins-x86.c

4 lines

sse2-builtins.c

6 lines

target-builtin-error-2.c

6 lines

Diff 58160

cfe/trunk/include/clang/Basic/BuiltinsX86.def

	Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_pmovmskb128, "iV16c", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_pmovmskb128, "iV16c", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movnti, "vi*i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movnti, "vi*i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movnti64, "vLLi*LLi", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movnti64, "vLLi*LLi", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movntpd, "vd*V2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movntpd, "vd*V2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_movntdq, "vV2LLi*V2LLi", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_movntdq, "vV2LLi*V2LLi", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_psadbw128, "V2LLiV16cV16c", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_psadbw128, "V2LLiV16cV16c", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_sqrtpd, "V2dV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_sqrtpd, "V2dV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_sqrtsd, "V2dV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_sqrtsd, "V2dV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtdq2pd, "V2dV4i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtdq2ps, "V4fV4i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtdq2ps, "V4fV4i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2dq, "V2LLiV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtpd2dq, "V2LLiV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2ps, "V4fV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtpd2ps, "V4fV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvttpd2dq, "V4iV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvttpd2dq, "V4iV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtsd2si, "iV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtsd2si, "iV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtsd2si64, "LLiV2d", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtsd2si64, "LLiV2d", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtps2dq, "V4iV4f", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvtps2dq, "V4iV4f", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvtps2pd, "V2dV4f", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_cvttps2dq, "V4iV4f", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_cvttps2dq, "V4iV4f", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_clflush, "vvC*", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_clflush, "vvC*", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_lfence, "v", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_lfence, "v", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_mfence, "v", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_mfence, "v", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_pause, "v", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_pause, "v", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_storedqu, "vc*V16c", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_storedqu, "vc*V16c", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_pmuludq128, "V2LLiV4iV4i", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_pmuludq128, "V2LLiV4iV4i", "", "sse2")
	TARGET_BUILTIN(__builtin_ia32_psraw128, "V8sV8sV8s", "", "sse2")			TARGET_BUILTIN(__builtin_ia32_psraw128, "V8sV8sV8s", "", "sse2")
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_vpermilvarps, "V4fV4fV4i", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vpermilvarps, "V4fV4fV4i", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vpermilvarpd256, "V4dV4dV4LLi", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vpermilvarpd256, "V4dV4dV4LLi", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vpermilvarps256, "V8fV8fV8i", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vpermilvarps256, "V8fV8fV8i", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_blendvpd256, "V4dV4dV4dV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_blendvpd256, "V4dV4dV4dV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_blendvps256, "V8fV8fV8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_blendvps256, "V8fV8fV8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_dpps256, "V8fV8fV8fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_dpps256, "V8fV8fV8fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cmppd256, "V4dV4dV4dIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmppd256, "V4dV4dV4dIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cmpps256, "V8fV8fV8fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cmpps256, "V8fV8fV8fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtdq2pd256, "V4dV4i", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtdq2ps256, "V8fV8i", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtdq2ps256, "V8fV8i", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2ps256, "V4fV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtpd2ps256, "V4fV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtps2dq256, "V8iV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtps2dq256, "V8iV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtps2pd256, "V4dV4f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvttpd2dq256, "V4iV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvttpd2dq256, "V4iV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2dq256, "V4iV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvtpd2dq256, "V4iV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_cvttps2dq256, "V8iV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_cvttps2dq256, "V8iV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_pd256, "V4dV4dV4dIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_pd256, "V4dV4dV4dIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_sqrtpd256, "V4dV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_sqrtpd256, "V4dV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_sqrtps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_sqrtps256, "V8fV8f", "", "avx")
	▲ Show 20 Lines • Show All 1,807 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/avxintrin.h

	Show First 20 Lines • Show All 2,044 Lines • ▼ Show 20 Lines
	/// This intrinsic corresponds to the \c VCVTDQ2PD / CVTDQ2PD instruction.			/// This intrinsic corresponds to the \c VCVTDQ2PD / CVTDQ2PD instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit integer vector of [4 x i32].			/// A 128-bit integer vector of [4 x i32].
	/// \returns A 256-bit vector of [4 x double] containing the converted values.			/// \returns A 256-bit vector of [4 x double] containing the converted values.
	static __inline __m256d __DEFAULT_FN_ATTRS			static __inline __m256d __DEFAULT_FN_ATTRS
	_mm256_cvtepi32_pd(__m128i __a)			_mm256_cvtepi32_pd(__m128i __a)
	{			{
	return (__m256d)__builtin_ia32_cvtdq2pd256((__v4si) __a);			return (__m256d)__builtin_convertvector((__v4si)__a, __v4df);
	}			}

	/// \brief Converts a vector of [8 x i32] into a vector of [8 x float].			/// \brief Converts a vector of [8 x i32] into a vector of [8 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the \c VCVTDQ2PS / CVTDQ2PS instruction.			/// This intrinsic corresponds to the \c VCVTDQ2PS / CVTDQ2PS instruction.
	///			///
	Show All 35 Lines
	_mm256_cvtps_epi32(__m256 __a)			_mm256_cvtps_epi32(__m256 __a)
	{			{
	return (__m256i)__builtin_ia32_cvtps2dq256((__v8sf) __a);			return (__m256i)__builtin_ia32_cvtps2dq256((__v8sf) __a);
	}			}

	static __inline __m256d __DEFAULT_FN_ATTRS			static __inline __m256d __DEFAULT_FN_ATTRS
	_mm256_cvtps_pd(__m128 __a)			_mm256_cvtps_pd(__m128 __a)
	{			{
	return (__m256d)__builtin_ia32_cvtps2pd256((__v4sf) __a);			return (__m256d)__builtin_convertvector((__v4sf)__a, __v4df);
	}			}

	static __inline __m128i __DEFAULT_FN_ATTRS			static __inline __m128i __DEFAULT_FN_ATTRS
	_mm256_cvttpd_epi32(__m256d __a)			_mm256_cvttpd_epi32(__m256d __a)
	{			{
	return (__m128i)__builtin_ia32_cvttpd2dq256((__v4df) __a);			return (__m128i)__builtin_ia32_cvttpd2dq256((__v4df) __a);
	}			}

	▲ Show 20 Lines • Show All 791 Lines • Show Last 20 Lines

cfe/trunk/lib/Headers/emmintrin.h

	Show First 20 Lines • Show All 380 Lines • ▼ Show 20 Lines
	_mm_cvtpd_ps(__m128d __a)			_mm_cvtpd_ps(__m128d __a)
	{			{
	return __builtin_ia32_cvtpd2ps((__v2df)__a);			return __builtin_ia32_cvtpd2ps((__v2df)__a);
	}			}

	static __inline__ __m128d __DEFAULT_FN_ATTRS			static __inline__ __m128d __DEFAULT_FN_ATTRS
	_mm_cvtps_pd(__m128 __a)			_mm_cvtps_pd(__m128 __a)
	{			{
	return __builtin_ia32_cvtps2pd((__v4sf)__a);			return (__m128d) __builtin_convertvector(
				__builtin_shufflevector((__v4sf)__a, (__v4sf)__a, 0, 1), __v2df);
	}			}

	static __inline__ __m128d __DEFAULT_FN_ATTRS			static __inline__ __m128d __DEFAULT_FN_ATTRS
	_mm_cvtepi32_pd(__m128i __a)			_mm_cvtepi32_pd(__m128i __a)
	{			{
	return __builtin_ia32_cvtdq2pd((__v4si)__a);			return (__m128d) __builtin_convertvector(
				__builtin_shufflevector((__v4si)__a, (__v4si)__a, 0, 1), __v2df);
	}			}

	static __inline__ __m128i __DEFAULT_FN_ATTRS			static __inline__ __m128i __DEFAULT_FN_ATTRS
	_mm_cvtpd_epi32(__m128d __a)			_mm_cvtpd_epi32(__m128d __a)
	{			{
	return __builtin_ia32_cvtpd2dq((__v2df)__a);			return __builtin_ia32_cvtpd2dq((__v2df)__a);
	}			}

	▲ Show 20 Lines • Show All 2,026 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/avx-builtins.c

	Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	__m128 test_mm_cmp_ss(__m128 A, __m128 B) {			__m128 test_mm_cmp_ss(__m128 A, __m128 B) {
	// CHECK-LABEL: test_mm_cmp_ss			// CHECK-LABEL: test_mm_cmp_ss
	// CHECK: call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %{{.}}, <4 x float> %{{.}}, i8 13)			// CHECK: call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %{{.}}, <4 x float> %{{.}}, i8 13)
	return _mm_cmp_ss(A, B, _CMP_GE_OS);			return _mm_cmp_ss(A, B, _CMP_GE_OS);
	}			}

	__m256d test_mm256_cvtepi32_pd(__m128i A) {			__m256d test_mm256_cvtepi32_pd(__m128i A) {
	// CHECK-LABEL: test_mm256_cvtepi32_pd			// CHECK-LABEL: test_mm256_cvtepi32_pd
	// CHECK: call <4 x double> @llvm.x86.avx.cvtdq2.pd.256(<4 x i32> %{{.*}})			// CHECK: sitofp <4 x i32> %{{.*}} to <4 x double>
	return _mm256_cvtepi32_pd(A);			return _mm256_cvtepi32_pd(A);
	}			}

	__m256 test_mm256_cvtepi32_ps(__m256i A) {			__m256 test_mm256_cvtepi32_ps(__m256i A) {
	// CHECK-LABEL: test_mm256_cvtepi32_ps			// CHECK-LABEL: test_mm256_cvtepi32_ps
	// CHECK: call <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32> %{{.*}})			// CHECK: call <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32> %{{.*}})
	return _mm256_cvtepi32_ps(A);			return _mm256_cvtepi32_ps(A);
	}			}
	Show All 13 Lines
	__m256i test_mm256_cvtps_epi32(__m256 A) {			__m256i test_mm256_cvtps_epi32(__m256 A) {
	// CHECK-LABEL: test_mm256_cvtps_epi32			// CHECK-LABEL: test_mm256_cvtps_epi32
	// CHECK: call <8 x i32> @llvm.x86.avx.cvt.ps2dq.256(<8 x float> %{{.*}})			// CHECK: call <8 x i32> @llvm.x86.avx.cvt.ps2dq.256(<8 x float> %{{.*}})
	return _mm256_cvtps_epi32(A);			return _mm256_cvtps_epi32(A);
	}			}

	__m256d test_mm256_cvtps_pd(__m128 A) {			__m256d test_mm256_cvtps_pd(__m128 A) {
	// CHECK-LABEL: test_mm256_cvtps_pd			// CHECK-LABEL: test_mm256_cvtps_pd
	// CHECK: call <4 x double> @llvm.x86.avx.cvt.ps2.pd.256(<4 x float> %{{.*}})			// CHECK: fpext <4 x float> %{{.*}} to <4 x double>
	return _mm256_cvtps_pd(A);			return _mm256_cvtps_pd(A);
	}			}

	__m128i test_mm256_cvttpd_epi32(__m256d A) {			__m128i test_mm256_cvttpd_epi32(__m256d A) {
	// CHECK-LABEL: test_mm256_cvttpd_epi32			// CHECK-LABEL: test_mm256_cvttpd_epi32
	// CHECK: call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %{{.*}})			// CHECK: call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %{{.*}})
	return _mm256_cvttpd_epi32(A);			return _mm256_cvttpd_epi32(A);
	}			}
	▲ Show 20 Lines • Show All 1,093 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/builtins-x86.c

Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines
#ifdef USE_64		#ifdef USE_64
(void) __builtin_ia32_movnti64(tmp_LLip, tmp_LLi);		(void) __builtin_ia32_movnti64(tmp_LLip, tmp_LLi);
#endif		#endif
(void) __builtin_ia32_movntpd(tmp_dp, tmp_V2d);		(void) __builtin_ia32_movntpd(tmp_dp, tmp_V2d);
(void) __builtin_ia32_movntdq(tmp_V2LLip, tmp_V2LLi);		(void) __builtin_ia32_movntdq(tmp_V2LLip, tmp_V2LLi);
tmp_V2LLi = __builtin_ia32_psadbw128(tmp_V16c, tmp_V16c);		tmp_V2LLi = __builtin_ia32_psadbw128(tmp_V16c, tmp_V16c);
tmp_V2d = __builtin_ia32_sqrtpd(tmp_V2d);		tmp_V2d = __builtin_ia32_sqrtpd(tmp_V2d);
tmp_V2d = __builtin_ia32_sqrtsd(tmp_V2d);		tmp_V2d = __builtin_ia32_sqrtsd(tmp_V2d);
tmp_V2d = __builtin_ia32_cvtdq2pd(tmp_V4i);
tmp_V4f = __builtin_ia32_cvtdq2ps(tmp_V4i);		tmp_V4f = __builtin_ia32_cvtdq2ps(tmp_V4i);
tmp_V2LLi = __builtin_ia32_cvtpd2dq(tmp_V2d);		tmp_V2LLi = __builtin_ia32_cvtpd2dq(tmp_V2d);
tmp_V2i = __builtin_ia32_cvtpd2pi(tmp_V2d);		tmp_V2i = __builtin_ia32_cvtpd2pi(tmp_V2d);
tmp_V4f = __builtin_ia32_cvtpd2ps(tmp_V2d);		tmp_V4f = __builtin_ia32_cvtpd2ps(tmp_V2d);
tmp_V4i = __builtin_ia32_cvttpd2dq(tmp_V2d);		tmp_V4i = __builtin_ia32_cvttpd2dq(tmp_V2d);
tmp_V2i = __builtin_ia32_cvttpd2pi(tmp_V2d);		tmp_V2i = __builtin_ia32_cvttpd2pi(tmp_V2d);
tmp_V2d = __builtin_ia32_cvtpi2pd(tmp_V2i);		tmp_V2d = __builtin_ia32_cvtpi2pd(tmp_V2i);
tmp_i = __builtin_ia32_cvtsd2si(tmp_V2d);		tmp_i = __builtin_ia32_cvtsd2si(tmp_V2d);
#ifdef USE_64		#ifdef USE_64
tmp_LLi = __builtin_ia32_cvtsd2si64(tmp_V2d);		tmp_LLi = __builtin_ia32_cvtsd2si64(tmp_V2d);
#endif		#endif
tmp_V4i = __builtin_ia32_cvtps2dq(tmp_V4f);		tmp_V4i = __builtin_ia32_cvtps2dq(tmp_V4f);
tmp_V2d = __builtin_ia32_cvtps2pd(tmp_V4f);
tmp_V4i = __builtin_ia32_cvttps2dq(tmp_V4f);		tmp_V4i = __builtin_ia32_cvttps2dq(tmp_V4f);
(void) __builtin_ia32_clflush(tmp_vCp);		(void) __builtin_ia32_clflush(tmp_vCp);
(void) __builtin_ia32_lfence();		(void) __builtin_ia32_lfence();
(void) __builtin_ia32_mfence();		(void) __builtin_ia32_mfence();
(void) __builtin_ia32_storedqu(tmp_cp, tmp_V16c);		(void) __builtin_ia32_storedqu(tmp_cp, tmp_V16c);
tmp_V4s = __builtin_ia32_psllwi(tmp_V4s, tmp_i);		tmp_V4s = __builtin_ia32_psllwi(tmp_V4s, tmp_i);
tmp_V2i = __builtin_ia32_pslldi(tmp_V2i, tmp_i);		tmp_V2i = __builtin_ia32_pslldi(tmp_V2i, tmp_i);
tmp_V1LLi = __builtin_ia32_psllqi(tmp_V1LLi, tmp_i);		tmp_V1LLi = __builtin_ia32_psllqi(tmp_V1LLi, tmp_i);
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	#endif
tmp_V4f = __builtin_ia32_vpermilvarps(tmp_V4f, tmp_V4i);		tmp_V4f = __builtin_ia32_vpermilvarps(tmp_V4f, tmp_V4i);
tmp_V4d = __builtin_ia32_vpermilvarpd256(tmp_V4d, tmp_V4LLi);		tmp_V4d = __builtin_ia32_vpermilvarpd256(tmp_V4d, tmp_V4LLi);
tmp_V8f = __builtin_ia32_vpermilvarps256(tmp_V8f, tmp_V8i);		tmp_V8f = __builtin_ia32_vpermilvarps256(tmp_V8f, tmp_V8i);
tmp_V4d = __builtin_ia32_blendvpd256(tmp_V4d, tmp_V4d, tmp_V4d);		tmp_V4d = __builtin_ia32_blendvpd256(tmp_V4d, tmp_V4d, tmp_V4d);
tmp_V8f = __builtin_ia32_blendvps256(tmp_V8f, tmp_V8f, tmp_V8f);		tmp_V8f = __builtin_ia32_blendvps256(tmp_V8f, tmp_V8f, tmp_V8f);
tmp_V8f = __builtin_ia32_dpps256(tmp_V8f, tmp_V8f, 0x7);		tmp_V8f = __builtin_ia32_dpps256(tmp_V8f, tmp_V8f, 0x7);
tmp_V4d = __builtin_ia32_cmppd256(tmp_V4d, tmp_V4d, 0);		tmp_V4d = __builtin_ia32_cmppd256(tmp_V4d, tmp_V4d, 0);
tmp_V8f = __builtin_ia32_cmpps256(tmp_V8f, tmp_V8f, 0);		tmp_V8f = __builtin_ia32_cmpps256(tmp_V8f, tmp_V8f, 0);
tmp_V4d = __builtin_ia32_cvtdq2pd256(tmp_V4i);
tmp_V8f = __builtin_ia32_cvtdq2ps256(tmp_V8i);		tmp_V8f = __builtin_ia32_cvtdq2ps256(tmp_V8i);
tmp_V4f = __builtin_ia32_cvtpd2ps256(tmp_V4d);		tmp_V4f = __builtin_ia32_cvtpd2ps256(tmp_V4d);
tmp_V8i = __builtin_ia32_cvtps2dq256(tmp_V8f);		tmp_V8i = __builtin_ia32_cvtps2dq256(tmp_V8f);
tmp_V4d = __builtin_ia32_cvtps2pd256(tmp_V4f);
tmp_V4i = __builtin_ia32_cvttpd2dq256(tmp_V4d);		tmp_V4i = __builtin_ia32_cvttpd2dq256(tmp_V4d);
tmp_V4i = __builtin_ia32_cvtpd2dq256(tmp_V4d);		tmp_V4i = __builtin_ia32_cvtpd2dq256(tmp_V4d);
tmp_V8i = __builtin_ia32_cvttps2dq256(tmp_V8f);		tmp_V8i = __builtin_ia32_cvttps2dq256(tmp_V8f);
tmp_V4d = __builtin_ia32_vperm2f128_pd256(tmp_V4d, tmp_V4d, 0x7);		tmp_V4d = __builtin_ia32_vperm2f128_pd256(tmp_V4d, tmp_V4d, 0x7);
tmp_V8f = __builtin_ia32_vperm2f128_ps256(tmp_V8f, tmp_V8f, 0x7);		tmp_V8f = __builtin_ia32_vperm2f128_ps256(tmp_V8f, tmp_V8f, 0x7);
tmp_V8i = __builtin_ia32_vperm2f128_si256(tmp_V8i, tmp_V8i, 0x7);		tmp_V8i = __builtin_ia32_vperm2f128_si256(tmp_V8i, tmp_V8i, 0x7);
tmp_V4d = __builtin_ia32_sqrtpd256(tmp_V4d);		tmp_V4d = __builtin_ia32_sqrtpd256(tmp_V4d);
tmp_V8f = __builtin_ia32_sqrtps256(tmp_V8f);		tmp_V8f = __builtin_ia32_sqrtps256(tmp_V8f);
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/sse2-builtins.c

	Show First 20 Lines • Show All 409 Lines • ▼ Show 20 Lines
	int test_mm_comineq_sd(__m128d A, __m128d B) {			int test_mm_comineq_sd(__m128d A, __m128d B) {
	// CHECK-LABEL: test_mm_comineq_sd			// CHECK-LABEL: test_mm_comineq_sd
	// CHECK: call i32 @llvm.x86.sse2.comineq.sd(<2 x double> %{{.}}, <2 x double> %{{.}})			// CHECK: call i32 @llvm.x86.sse2.comineq.sd(<2 x double> %{{.}}, <2 x double> %{{.}})
	return _mm_comineq_sd(A, B);			return _mm_comineq_sd(A, B);
	}			}

	__m128d test_mm_cvtepi32_pd(__m128i A) {			__m128d test_mm_cvtepi32_pd(__m128i A) {
	// CHECK-LABEL: test_mm_cvtepi32_pd			// CHECK-LABEL: test_mm_cvtepi32_pd
	// CHECK: call <2 x double> @llvm.x86.sse2.cvtdq2pd(<4 x i32> %{{.*}})			// CHECK: shufflevector <4 x i32> %{{.}}, <4 x i32> %{{.}}, <2 x i32> <i32 0, i32 1>
				// CHECK: sitofp <2 x i32> %{{.*}} to <2 x double>
	return _mm_cvtepi32_pd(A);			return _mm_cvtepi32_pd(A);
	}			}

	__m128 test_mm_cvtepi32_ps(__m128i A) {			__m128 test_mm_cvtepi32_ps(__m128i A) {
	// CHECK-LABEL: test_mm_cvtepi32_ps			// CHECK-LABEL: test_mm_cvtepi32_ps
	// CHECK: call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %{{.*}})			// CHECK: call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %{{.*}})
	return _mm_cvtepi32_ps(A);			return _mm_cvtepi32_ps(A);
	}			}
	Show All 13 Lines
	__m128i test_mm_cvtps_epi32(__m128 A) {			__m128i test_mm_cvtps_epi32(__m128 A) {
	// CHECK-LABEL: test_mm_cvtps_epi32			// CHECK-LABEL: test_mm_cvtps_epi32
	// CHECK: call <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float> %{{.*}})			// CHECK: call <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float> %{{.*}})
	return _mm_cvtps_epi32(A);			return _mm_cvtps_epi32(A);
	}			}

	__m128d test_mm_cvtps_pd(__m128 A) {			__m128d test_mm_cvtps_pd(__m128 A) {
	// CHECK-LABEL: test_mm_cvtps_pd			// CHECK-LABEL: test_mm_cvtps_pd
	// CHECK: call <2 x double> @llvm.x86.sse2.cvtps2pd(<4 x float> %{{.*}})			// CHECK: shufflevector <4 x float> %{{.}}, <4 x float> %{{.}}, <2 x i32> <i32 0, i32 1>
				// CHECK: fpext <2 x float> %{{.*}} to <2 x double>
	return _mm_cvtps_pd(A);			return _mm_cvtps_pd(A);
	}			}

	double test_mm_cvtsd_f64(__m128d A) {			double test_mm_cvtsd_f64(__m128d A) {
	// CHECK-LABEL: test_mm_cvtsd_f64			// CHECK-LABEL: test_mm_cvtsd_f64
	// CHECK: extractelement <2 x double> %{{.*}}, i32 0			// CHECK: extractelement <2 x double> %{{.*}}, i32 0
	return _mm_cvtsd_f64(A);			return _mm_cvtsd_f64(A);
	}			}
	▲ Show 20 Lines • Show All 1,016 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/target-builtin-error-2.c

	// RUN: %clang_cc1 %s -triple=x86_64-linux-gnu -S -verify -o -			// RUN: %clang_cc1 %s -triple=x86_64-linux-gnu -S -verify -o -
	#define __MM_MALLOC_H			#define __MM_MALLOC_H

	#include <x86intrin.h>			#include <x86intrin.h>

	// Since we do code generation on a function level this needs to error out since			// Since we do code generation on a function level this needs to error out since
	// the subtarget feature won't be available.			// the subtarget feature won't be available.
	__m256d wombat(__m128i a) {			__m128 wombat(__m128i a) {
	if (__builtin_cpu_supports("avx"))			if (__builtin_cpu_supports("avx"))
	return __builtin_ia32_cvtdq2pd256((__v4si)a); // expected-error {{'__builtin_ia32_cvtdq2pd256' needs target feature avx}}			return __builtin_ia32_vpermilvarps((__v4sf) {0.0f, 1.0f, 2.0f, 3.0f}, (__v4si)a); // expected-error {{'__builtin_ia32_vpermilvarps' needs target feature avx}}
	else			else
	return (__m256d){0, 0, 0, 0};			return (__m128){0, 0};
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 58160

cfe/trunk/include/clang/Basic/BuiltinsX86.def

cfe/trunk/lib/Headers/avxintrin.h

cfe/trunk/lib/Headers/emmintrin.h

cfe/trunk/test/CodeGen/avx-builtins.c

cfe/trunk/test/CodeGen/builtins-x86.c

cfe/trunk/test/CodeGen/sse2-builtins.c

cfe/trunk/test/CodeGen/target-builtin-error-2.c

[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
ClosedPublic