This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Headers/
-
Headers/
2
avx512fintrin.h
-
avx512fp16intrin.h
-
avxintrin.h
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1
avx-builtins.c
1
avx-cast-builtins.c
-
avx512f-builtins.c
-
avx512fp16-builtins.c

Differential D143287

[Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value
ClosedPublic

Authored by ManuelJBrito on Feb 3 2023, 10:25 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
nlopes

Commits

rG5184dc2d7cce: [Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value

Summary

The following intrinsics are currently implemented using a shufflevector with an undefined mask, this is however incorrect according to intel's semantics for undefined value which expect an unknown but consistent value.

With __builtin_nondeterministic_value we can now match intel's undefined value.

Related patch for more context : https://reviews.llvm.org/D103874

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ManuelJBrito created this revision.Feb 3 2023, 10:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 10:25 AM

Herald added subscribers: pengfei, mgrang. · View Herald Transcript

ManuelJBrito requested review of this revision.Feb 3 2023, 10:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 10:25 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B211764: Diff 494679.Feb 3 2023, 11:39 AM

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

What do we gain from using __builtin_nondeterministic_value instead of just setzero? https://godbolt.org/z/zrb6858Mr

In D143287#4103597, @craig.topper wrote:

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

I don't think it's a concern . Here https://godbolt.org/z/1ecM8roYh the lowering seems ok.

In D143287#4104056, @RKSimon wrote:

What do we gain from using __builtin_nondeterministic_value instead of just setzero? https://godbolt.org/z/zrb6858Mr

__builtin_nondeterministic_value is lowered to freeze(poison)

These intrinsics are meant to be no-op and we can't to that with zeroinitializer, although as of right now instcombine seems to transform the freeze(poison) into zeroinitializer(as shown by your example), so i'll make another patch to fix that.

In D143287#4107439, @ManuelJBrito wrote:

In D143287#4103597, @craig.topper wrote:

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

I don't think it's a concern . Here https://godbolt.org/z/1ecM8roYh the lowering seems ok.

The bugs start occurring if you use the same nondeterministic value multiple times and expect the value to be the same for all uses.

In D143287#4108165, @craig.topper wrote:

In D143287#4107439, @ManuelJBrito wrote:

In D143287#4103597, @craig.topper wrote:

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

I don't think it's a concern . Here https://godbolt.org/z/1ecM8roYh the lowering seems ok.

The bugs start occurring if you use the same nondeterministic value multiple times and expect the value to be the same for all uses.

I understand now ... so if the bug is present it works as if it were an undef. For these intrinsics we would just regress to the current behavior.
So is it okay if we land this patch? Because it will still be an improvement.

Add end to end tests

Currrently these expect a mov that will be removed when the instcombine bug is fixed

Harbormaster completed remote builds in B212447: Diff 495611.Feb 7 2023, 2:49 PM

RKSimon added inline comments.Feb 8 2023, 7:10 AM

clang/test/CodeGen/X86/avx-builtins.c
146–147	Would it be useful to add a check for a "freeze <2 x double> poison" (or similar) to each cast test?

ManuelJBrito mentioned this in D143593: [InstCombine] Don't fold freeze poison when it's used in shufflevector.Feb 8 2023, 10:18 AM

Match freeze(poison) operand in tests

Harbormaster completed remote builds in B212814: Diff 496129.Feb 9 2023, 8:36 AM

Update tests after D143593.

There are some performance regressions with casts from 128 to 512. The backend inserts vinsertf instructions. So that has to be fixed.
In D130339 @RKSimon mentioned something about custom vector widening patterns that need to be adjusted to handle freeze(undef), any pointers for how a patch for that would look like?

Harbormaster completed remote builds in B215722: Diff 500149.Feb 24 2023, 7:45 AM

In D143287#4150186, @ManuelJBrito wrote:

There are some performance regressions with casts from 128 to 512. The backend inserts vinsertf instructions. So that has to be fixed.
In D130339 @RKSimon mentioned something about custom vector widening patterns that need to be adjusted to handle freeze(undef), any pointers for how a patch for that would look like?

IIRC there are various places in X86ISelLowering where we try to match vector widening/concat patterns - such as collectConcatOps/getTargetShuffleAndZeroables/resolveTargetShuffleInputsAndMask (and many more) - it should be fine to address these on a case by case basis though, as many should just disappear via SimplifyDemandedVectorElts etc.

H, is D104790 superseded by this patch? I wonder what is the status of this patch as well.

In D143287#4179020, @aqjune wrote:

H, is D104790 superseded by this patch?

I don't think so we still need to fix the undefined intrinsics, right? Maybe I'm not understanding the question.

I wonder what is the status of this patch as well.

We need to land D144903 first to have the correct assembly, but it's currently held up by a crash in one of the tests due to an unrelated issue.

In D143287#4179344, @ManuelJBrito wrote:

In D143287#4179020, @aqjune wrote:

H, is D104790 superseded by this patch?

I don't think so we still need to fix the undefined intrinsics, right? Maybe I'm not understanding the question.

Oh right, D104790 and this deal with different intrinsics, thanks.
I was wondering whether I could clean up some of my open patches.

I wonder what is the status of this patch as well.

We need to land D144903 first to have the correct assembly, but it's currently held up by a crash in one of the tests due to an unrelated issue.

I see, hope that the crash is fixed soon..!

Rebase

avx-cast-builtins.c was moved to D144903

Harbormaster completed remote builds in B218429: Diff 503819.Mar 9 2023, 11:11 AM

Implementing the 128 to 512 casts by filling the rest of the vector with the same definition of a nondeterministic_value is not correct because :

a = freeze poison
v = <a, a>

is not the same as

v = freeze poison

The only solution I'm seeing ,using the shufflevector, is doing the conversion in two steps:

build a 256 vector with the upper half being undefined( freeze poison)
build a 512 vector where the lower half is the previous 256 vector and the upper half being undefined

I think this would require two shuffles which is unfortunate.

This would ensure no miscompilations due to multiple uses of the same freeze undef/poison but would probably require some backend work to ensure the pattern is recognized to emit efficient assembly.

Would this work ? @RKSimon @craig.topper

In D143287#4202174, @ManuelJBrito wrote:

Implementing the 128 to 512 casts by filling the rest of the vector with the same definition of a nondeterministic_value is not correct because :

a = freeze poison
v = <a, a>

is not the same as

v = freeze poison

The only solution I'm seeing ,using the shufflevector, is doing the conversion in two steps:

build a 256 vector with the upper half being undefined( freeze poison)

build a 512 vector where the lower half is the previous 256 vector and the upper half being undefined

I think this would require two shuffles which is unfortunate.

This would ensure no miscompilations due to multiple uses of the same freeze undef/poison but would probably require some backend work to ensure the pattern is recognized to emit efficient assembly.

Semantically that is correct.
But the backend may require some tweaks to recognize this pattern.

Update to remove multiple uses of freeze poison.
I am unsure about the code style used in the 128 to 512 casts. Any comments are appreciated.

LGTM, but please wait for another reviewer.

This revision is now accepted and ready to land.Apr 7 2023, 4:53 AM

Harbormaster completed remote builds in B224201: Diff 511663.Apr 7 2023, 5:29 AM

Any further comments @RKSimon ?

LGTM - just simplify the shuffle masks (even if it break 80-col).

Please keep an eye out for any regressions, I'm not certain we've shaken out every possible issue.

clang/lib/Headers/avx512fintrin.h
401	We don't always keep to clang formatting in the headers if it confuses things - better to keep the entire shuffle mask on a single line if possible - same for the others return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0, 1, 2, 3, 4, 5, 6, 7);
408	Maybe: return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
clang/test/CodeGen/X86/avx-cast-builtins.c
101	fixme

Thanks for the review! I'll simplify the masks.

This revision was landed with ongoing or failed builds.Apr 17 2023, 4:59 AM

Closed by commit rG5184dc2d7cce: [Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value (authored by ManuelJBrito). · Explain Why

This revision was automatically updated to reflect the committed changes.

ManuelJBrito added a commit: rG5184dc2d7cce: [Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value.

Revision Contents

Path

Size

clang/

lib/

Headers/

avx512fintrin.h

15 lines

avx512fp16intrin.h

19 lines

avxintrin.h

6 lines

test/

CodeGen/

X86/

avx-builtins.c

9 lines

avx-cast-builtins.c

100 lines

avx512f-builtins.c

15 lines

avx512fp16-builtins.c

9 lines

Diff 496129

clang/lib/Headers/avx512fintrin.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	return (__m512d)__builtin_shufflevector((__v2df) __A, (__v2df) __A,
0, 0, 0, 0, 0, 0, 0, 0);		0, 0, 0, 0, 0, 0, 0, 0);
}		}

/* Cast between vector types */		/* Cast between vector types */

static __inline __m512d __DEFAULT_FN_ATTRS512		static __inline __m512d __DEFAULT_FN_ATTRS512
_mm512_castpd256_pd512(__m256d __a)		_mm512_castpd256_pd512(__m256d __a)
{		{
return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, -1, -1, -1, -1);		return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0, 1, 2, 3, 4, 5, 6, 7);
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions We don't always keep to clang formatting in the headers if it confuses things - better to keep the entire shuffle mask on a single line if possible - same for the others return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0, 1, 2, 3, 4, 5, 6, 7); RKSimon: We don't always keep to clang formatting in the headers if it confuses things - better to keep…

static __inline __m512 __DEFAULT_FN_ATTRS512		static __inline __m512 __DEFAULT_FN_ATTRS512
_mm512_castps256_ps512(__m256 __a)		_mm512_castps256_ps512(__m256 __a)
{		{
return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7,		return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a),
-1, -1, -1, -1, -1, -1, -1, -1);		0, 1, 2, 3, 4, 5, 6, 7,
		8, 9, 10, 11, 12, 13, 14, 15);
		RKSimonUnsubmitted Not Done Reply Inline Actions Maybe: return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15); RKSimon: Maybe: ``` return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0…
}		}

static __inline __m128d __DEFAULT_FN_ATTRS512		static __inline __m128d __DEFAULT_FN_ATTRS512
_mm512_castpd512_pd128(__m512d __a)		_mm512_castpd512_pd128(__m512d __a)
{		{
return __builtin_shufflevector(__a, __a, 0, 1);		return __builtin_shufflevector(__a, __a, 0, 1);
}		}

Show All 25 Lines
_mm512_castpd_si512 (__m512d __A)		_mm512_castpd_si512 (__m512d __A)
{		{
return (__m512i) (__A);		return (__m512i) (__A);
}		}

static __inline__ __m512d __DEFAULT_FN_ATTRS512		static __inline__ __m512d __DEFAULT_FN_ATTRS512
_mm512_castpd128_pd512 (__m128d __A)		_mm512_castpd128_pd512 (__m128d __A)
{		{
return __builtin_shufflevector( __A, __A, 0, 1, -1, -1, -1, -1, -1, -1);		return __builtin_shufflevector( __A, __builtin_nondeterministic_value(__A), 0, 1, 2, 3, 2, 3, 2, 3);
}		}

static __inline __m512d __DEFAULT_FN_ATTRS512		static __inline __m512d __DEFAULT_FN_ATTRS512
_mm512_castps_pd (__m512 __A)		_mm512_castps_pd (__m512 __A)
{		{
return (__m512d) (__A);		return (__m512d) (__A);
}		}

static __inline __m512i __DEFAULT_FN_ATTRS512		static __inline __m512i __DEFAULT_FN_ATTRS512
_mm512_castps_si512 (__m512 __A)		_mm512_castps_si512 (__m512 __A)
{		{
return (__m512i) (__A);		return (__m512i) (__A);
}		}

static __inline__ __m512 __DEFAULT_FN_ATTRS512		static __inline__ __m512 __DEFAULT_FN_ATTRS512
_mm512_castps128_ps512 (__m128 __A)		_mm512_castps128_ps512 (__m128 __A)
{		{
return __builtin_shufflevector( __A, __A, 0, 1, 2, 3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1);		return __builtin_shufflevector( __A, __builtin_nondeterministic_value(__A), 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7);
}		}

static __inline__ __m512i __DEFAULT_FN_ATTRS512		static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_castsi128_si512 (__m128i __A)		_mm512_castsi128_si512 (__m128i __A)
{		{
return __builtin_shufflevector( __A, __A, 0, 1, -1, -1, -1, -1, -1, -1);		return __builtin_shufflevector( __A, __builtin_nondeterministic_value(__A), 0, 1, 2, 3, 2, 3, 2, 3);
}		}

static __inline__ __m512i __DEFAULT_FN_ATTRS512		static __inline__ __m512i __DEFAULT_FN_ATTRS512
_mm512_castsi256_si512 (__m256i __A)		_mm512_castsi256_si512 (__m256i __A)
{		{
return __builtin_shufflevector( __A, __A, 0, 1, 2, 3, -1, -1, -1, -1);		return __builtin_shufflevector( __A, __builtin_nondeterministic_value(__A), 0, 1, 2, 3, 4, 5, 6, 7);
}		}

static __inline __m512 __DEFAULT_FN_ATTRS512		static __inline __m512 __DEFAULT_FN_ATTRS512
_mm512_castsi512_ps (__m512i __A)		_mm512_castsi512_ps (__m512i __A)
{		{
return (__m512) (__A);		return (__m512) (__A);
}		}

▲ Show 20 Lines • Show All 9,278 Lines • Show Last 20 Lines

clang/lib/Headers/avx512fp16intrin.h

	Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
	static __inline__ __m256h __DEFAULT_FN_ATTRS512			static __inline__ __m256h __DEFAULT_FN_ATTRS512
	_mm512_castph512_ph256(__m512h __a) {			_mm512_castph512_ph256(__m512h __a) {
	return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,			return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
	12, 13, 14, 15);			12, 13, 14, 15);
	}			}

	static __inline__ __m256h __DEFAULT_FN_ATTRS256			static __inline__ __m256h __DEFAULT_FN_ATTRS256
	_mm256_castph128_ph256(__m128h __a) {			_mm256_castph128_ph256(__m128h __a) {
	return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, -1, -1, -1,			return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a),
	-1, -1, -1, -1, -1);			0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
				14, 15);
	}			}

	static __inline__ __m512h __DEFAULT_FN_ATTRS512			static __inline__ __m512h __DEFAULT_FN_ATTRS512
	_mm512_castph128_ph512(__m128h __a) {			_mm512_castph128_ph512(__m128h __a) {
	return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, -1, -1, -1,			return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a),
	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,			0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
	-1, -1, -1, -1, -1, -1, -1, -1, -1);			14, 15, 8, 9, 10, 11, 12, 13, 14, 15, 8, 9,
				10, 11, 12, 13, 14, 15);
	}			}

	static __inline__ __m512h __DEFAULT_FN_ATTRS512			static __inline__ __m512h __DEFAULT_FN_ATTRS512
	_mm512_castph256_ph512(__m256h __a) {			_mm512_castph256_ph512(__m256h __a) {
	return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,			return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a),
	12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1,			0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
	-1, -1, -1, -1, -1, -1, -1, -1);			12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
				23, 24, 25, 26, 27, 28, 29, 30, 31);
	}			}

	/// Constructs a 256-bit floating-point vector of [16 x half] from a			/// Constructs a 256-bit floating-point vector of [16 x half] from a
	/// 128-bit floating-point vector of [8 x half]. The lower 128 bits			/// 128-bit floating-point vector of [8 x half]. The lower 128 bits
	/// contain the value of the source vector. The upper 384 bits are set			/// contain the value of the source vector. The upper 384 bits are set
	/// to zero.			/// to zero.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	▲ Show 20 Lines • Show All 3,128 Lines • Show Last 20 Lines

clang/lib/Headers/avxintrin.h

	Show First 20 Lines • Show All 4,493 Lines • ▼ Show 20 Lines
	/// \param __a			/// \param __a
	/// A 128-bit vector of [2 x double].			/// A 128-bit vector of [2 x double].
	/// \returns A 256-bit floating-point vector of [4 x double]. The lower 128 bits			/// \returns A 256-bit floating-point vector of [4 x double]. The lower 128 bits
	/// contain the value of the parameter. The contents of the upper 128 bits			/// contain the value of the parameter. The contents of the upper 128 bits
	/// are undefined.			/// are undefined.
	static __inline __m256d __DEFAULT_FN_ATTRS			static __inline __m256d __DEFAULT_FN_ATTRS
	_mm256_castpd128_pd256(__m128d __a)			_mm256_castpd128_pd256(__m128d __a)
	{			{
	return __builtin_shufflevector((__v2df)__a, (__v2df)__a, 0, 1, -1, -1);			return __builtin_shufflevector((__v2df)__a, (__v2df)__builtin_nondeterministic_value(__a), 0, 1, 2, 3);
	}			}

	/// Constructs a 256-bit floating-point vector of [8 x float] from a			/// Constructs a 256-bit floating-point vector of [8 x float] from a
	/// 128-bit floating-point vector of [4 x float].			/// 128-bit floating-point vector of [4 x float].
	///			///
	/// The lower 128 bits contain the value of the source vector. The contents			/// The lower 128 bits contain the value of the source vector. The contents
	/// of the upper 128 bits are undefined.			/// of the upper 128 bits are undefined.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic has no corresponding instruction.			/// This intrinsic has no corresponding instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit vector of [4 x float].			/// A 128-bit vector of [4 x float].
	/// \returns A 256-bit floating-point vector of [8 x float]. The lower 128 bits			/// \returns A 256-bit floating-point vector of [8 x float]. The lower 128 bits
	/// contain the value of the parameter. The contents of the upper 128 bits			/// contain the value of the parameter. The contents of the upper 128 bits
	/// are undefined.			/// are undefined.
	static __inline __m256 __DEFAULT_FN_ATTRS			static __inline __m256 __DEFAULT_FN_ATTRS
	_mm256_castps128_ps256(__m128 __a)			_mm256_castps128_ps256(__m128 __a)
	{			{
	return __builtin_shufflevector((__v4sf)__a, (__v4sf)__a, 0, 1, 2, 3, -1, -1, -1, -1);			return __builtin_shufflevector((__v4sf)__a, (__v4sf)__builtin_nondeterministic_value(__a), 0, 1, 2, 3, 4, 5, 6, 7);
	}			}

	/// Constructs a 256-bit integer vector from a 128-bit integer vector.			/// Constructs a 256-bit integer vector from a 128-bit integer vector.
	///			///
	/// The lower 128 bits contain the value of the source vector. The contents			/// The lower 128 bits contain the value of the source vector. The contents
	/// of the upper 128 bits are undefined.			/// of the upper 128 bits are undefined.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic has no corresponding instruction.			/// This intrinsic has no corresponding instruction.
	///			///
	/// \param __a			/// \param __a
	/// A 128-bit integer vector.			/// A 128-bit integer vector.
	/// \returns A 256-bit integer vector. The lower 128 bits contain the value of			/// \returns A 256-bit integer vector. The lower 128 bits contain the value of
	/// the parameter. The contents of the upper 128 bits are undefined.			/// the parameter. The contents of the upper 128 bits are undefined.
	static __inline __m256i __DEFAULT_FN_ATTRS			static __inline __m256i __DEFAULT_FN_ATTRS
	_mm256_castsi128_si256(__m128i __a)			_mm256_castsi128_si256(__m128i __a)
	{			{
	return __builtin_shufflevector((__v2di)__a, (__v2di)__a, 0, 1, -1, -1);			return __builtin_shufflevector((__v2di)__a, (__v2di)__builtin_nondeterministic_value(__a), 0, 1, 2, 3);
	}			}

	/// Constructs a 256-bit floating-point vector of [4 x double] from a			/// Constructs a 256-bit floating-point vector of [4 x double] from a
	/// 128-bit floating-point vector of [2 x double]. The lower 128 bits			/// 128-bit floating-point vector of [2 x double]. The lower 128 bits
	/// contain the value of the source vector. The upper 128 bits are set			/// contain the value of the source vector. The upper 128 bits are set
	/// to zero.			/// to zero.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx-builtins.c

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines

	__m256i test_mm256_castpd_si256(__m256d A) {			__m256i test_mm256_castpd_si256(__m256d A) {
	// CHECK-LABEL: test_mm256_castpd_si256			// CHECK-LABEL: test_mm256_castpd_si256
	return _mm256_castpd_si256(A);			return _mm256_castpd_si256(A);
	}			}

	__m256d test_mm256_castpd128_pd256(__m128d A) {			__m256d test_mm256_castpd128_pd256(__m128d A) {
	// CHECK-LABEL: test_mm256_castpd128_pd256			// CHECK-LABEL: test_mm256_castpd128_pd256
	// CHECK: shufflevector <2 x double> %{{.}}, <2 x double> %{{.}}, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			// CHECK: [[A:%.*]] = freeze <2 x double> poison
				// CHECK: shufflevector <2 x double> %{{.*}}, <2 x double> [[A]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				RKSimonUnsubmitted Not Done Reply Inline Actions Would it be useful to add a check for a "freeze <2 x double> poison" (or similar) to each cast test? RKSimon: Would it be useful to add a check for a "freeze <2 x double> poison" (or similar) to each cast…
	return _mm256_castpd128_pd256(A);			return _mm256_castpd128_pd256(A);
	}			}

	__m128d test_mm256_castpd256_pd128(__m256d A) {			__m128d test_mm256_castpd256_pd128(__m256d A) {
	// CHECK-LABEL: test_mm256_castpd256_pd128			// CHECK-LABEL: test_mm256_castpd256_pd128
	// CHECK: shufflevector <4 x double> %{{.}}, <4 x double> %{{.}}, <2 x i32> <i32 0, i32 1>			// CHECK: shufflevector <4 x double> %{{.}}, <4 x double> %{{.}}, <2 x i32> <i32 0, i32 1>
	return _mm256_castpd256_pd128(A);			return _mm256_castpd256_pd128(A);
	}			}

	__m256d test_mm256_castps_pd(__m256 A) {			__m256d test_mm256_castps_pd(__m256 A) {
	// CHECK-LABEL: test_mm256_castps_pd			// CHECK-LABEL: test_mm256_castps_pd
	return _mm256_castps_pd(A);			return _mm256_castps_pd(A);
	}			}

	__m256i test_mm256_castps_si256(__m256 A) {			__m256i test_mm256_castps_si256(__m256 A) {
	// CHECK-LABEL: test_mm256_castps_si256			// CHECK-LABEL: test_mm256_castps_si256
	return _mm256_castps_si256(A);			return _mm256_castps_si256(A);
	}			}

	__m256 test_mm256_castps128_ps256(__m128 A) {			__m256 test_mm256_castps128_ps256(__m128 A) {
	// CHECK-LABEL: test_mm256_castps128_ps256			// CHECK-LABEL: test_mm256_castps128_ps256
	// CHECK: shufflevector <4 x float> %{{.}}, <4 x float> %{{.}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			// CHECK: [[A:%.*]] = freeze <4 x float> poison
				// CHECK: shufflevector <4 x float> %{{.*}}, <4 x float> [[A]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	return _mm256_castps128_ps256(A);			return _mm256_castps128_ps256(A);
	}			}

	__m128 test_mm256_castps256_ps128(__m256 A) {			__m128 test_mm256_castps256_ps128(__m256 A) {
	// CHECK-LABEL: test_mm256_castps256_ps128			// CHECK-LABEL: test_mm256_castps256_ps128
	// CHECK: shufflevector <8 x float> %{{.}}, <8 x float> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			// CHECK: shufflevector <8 x float> %{{.}}, <8 x float> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	return _mm256_castps256_ps128(A);			return _mm256_castps256_ps128(A);
	}			}

	__m256i test_mm256_castsi128_si256(__m128i A) {			__m256i test_mm256_castsi128_si256(__m128i A) {
	// CHECK-LABEL: test_mm256_castsi128_si256			// CHECK-LABEL: test_mm256_castsi128_si256
	// CHECK: shufflevector <2 x i64> %{{.}}, <2 x i64> %{{.}}, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			// CHECK: [[A:%.*]] = freeze <2 x i64> poison
				// CHECK: shufflevector <2 x i64> %{{.*}}, <2 x i64> [[A]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	return _mm256_castsi128_si256(A);			return _mm256_castsi128_si256(A);
	}			}

	__m256d test_mm256_castsi256_pd(__m256i A) {			__m256d test_mm256_castsi256_pd(__m256i A) {
	// CHECK-LABEL: test_mm256_castsi256_pd			// CHECK-LABEL: test_mm256_castsi256_pd
	return _mm256_castsi256_pd(A);			return _mm256_castsi256_pd(A);
	}			}

	▲ Show 20 Lines • Show All 1,981 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx-cast-builtins.c

This file was added.

				// RUN: %clang_cc1 %s -O3 -flax-vector-conversions=none -ffreestanding %s -triple=x86_64-unknown-unknown -target-feature +avx -target-feature +avx512f -target-feature +avx512fp16 -S -o - \| FileCheck %s


				#include <immintrin.h>

				__m256d test_mm256_castpd128_pd256(__m128d A) {
				// CHECK-LABEL: test_mm256_castpd128_pd256
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm256_castpd128_pd256(A);
				}

				__m256 test_mm256_castps128_ps256(__m128 A) {
				// CHECK-LABEL: test_mm256_castps128_ps256
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm256_castps128_ps256(A);
				}

				__m256i test_mm256_castsi128_si256(__m128i A) {
				// CHECK-LABEL: test_mm256_castsi128_si256
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm256_castsi128_si256(A);
				}

				__m256h test_mm256_castph128_ph256(__m128h A) {
				// CHECK-LABEL: test_mm256_castph128_ph256
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm256_castph128_ph256(A);
				}

				__m512h test_mm512_castph128_ph512(__m128h A) {
				// CHECK-LABEL: test_mm512_castph128_ph512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castph128_ph512(A);
				}

				__m512h test_mm512_castph256_ph512(__m256h A) {
				// CHECK-LABEL: test_mm512_castph256_ph512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %ymm0, %ymm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castph256_ph512(A);
				}

				__m512d test_mm512_castpd256_pd512(__m256d A){
				// CHECK-LABEL: test_mm512_castpd256_pd512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %ymm0, %ymm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castpd256_pd512(A);
				}

				__m512 test_mm512_castps256_ps512(__m256 A){
				// CHECK-LABEL: test_mm512_castps256_ps512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %ymm0, %ymm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castps256_ps512(A);
				}

				__m512d test_mm512_castpd128_pd512(__m128d A){
				// CHECK-LABEL: test_mm512_castpd128_pd512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castpd128_pd512(A);
				}

				__m512 test_mm512_castps128_ps512(__m128 A){
				// CHECK-LABEL: test_mm512_castps128_ps512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castps128_ps512(A);
				}

				__m512i test_mm512_castsi128_si512(__m128i A){
				// CHECK-LABEL: test_mm512_castsi128_si512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %xmm0, %xmm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castsi128_si512(A);
				}

				__m512i test_mm512_castsi256_si512(__m256i A){
				// CHECK-LABEL: test_mm512_castsi256_si512
				// CHECK: # %bb.0:
				// CHECK-NEXT: vmovaps %ymm0, %ymm0
				// CHECK-NEXT: ret{{[l\|q]}}
				return _mm512_castsi256_si512(A);
				}
				RKSimonUnsubmitted Not Done Reply Inline Actions fixme RKSimon: fixme

clang/test/CodeGen/X86/avx512f-builtins.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,981 Lines • ▼ Show 20 Lines
{		{
// CHECK-LABEL: @test_mm512_castpd_si512		// CHECK-LABEL: @test_mm512_castpd_si512
// CHECK: bitcast <8 x double> %{{.}} to <8 x i64>		// CHECK: bitcast <8 x double> %{{.}} to <8 x i64>
return _mm512_castpd_si512 (__A);		return _mm512_castpd_si512 (__A);
}		}

__m512 test_mm512_castps128_ps512(__m128 __A) {		__m512 test_mm512_castps128_ps512(__m128 __A) {
// CHECK-LABEL: @test_mm512_castps128_ps512		// CHECK-LABEL: @test_mm512_castps128_ps512
// CHECK: shufflevector <4 x float> %{{.}}, <4 x float> %{{.}}, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		// CHECK: [[A:%.*]] = freeze <4 x float> poison
		// CHECK: shufflevector <4 x float> %{{.*}}, <4 x float> [[A]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 4, i32 5, i32 6, i32 7, i32 4, i32 5, i32 6, i32 7>
return _mm512_castps128_ps512(__A);		return _mm512_castps128_ps512(__A);
}		}

__m512d test_mm512_castpd128_pd512(__m128d __A) {		__m512d test_mm512_castpd128_pd512(__m128d __A) {
// CHECK-LABEL: @test_mm512_castpd128_pd512		// CHECK-LABEL: @test_mm512_castpd128_pd512
// CHECK: shufflevector <2 x double> %{{.}}, <2 x double> %{{.}}, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		// CHECK: [[A:%.*]] = freeze <2 x double> poison
		// CHECK: shufflevector <2 x double> %{{.*}}, <2 x double> [[A]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 2, i32 3, i32 2, i32 3>
return _mm512_castpd128_pd512(__A);		return _mm512_castpd128_pd512(__A);
}		}

__m512i test_mm512_set1_epi8(char d)		__m512i test_mm512_set1_epi8(char d)
{		{
// CHECK-LABEL: @test_mm512_set1_epi8		// CHECK-LABEL: @test_mm512_set1_epi8
// CHECK: insertelement <64 x i8> {{.*}}, i32 0		// CHECK: insertelement <64 x i8> {{.*}}, i32 0
// CHECK: insertelement <64 x i8> {{.*}}, i32 1		// CHECK: insertelement <64 x i8> {{.*}}, i32 1
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	__m512 test_mm512_setr4_ps(float e0, float e1, float e2, float e3)
// CHECK-LABEL: @test_mm512_setr4_ps		// CHECK-LABEL: @test_mm512_setr4_ps
// CHECK: insertelement <16 x float> {{.*}}, i32 15		// CHECK: insertelement <16 x float> {{.*}}, i32 15
return _mm512_setr4_ps(e0,e1,e2,e3);		return _mm512_setr4_ps(e0,e1,e2,e3);
}		}

__m512d test_mm512_castpd256_pd512(__m256d a)		__m512d test_mm512_castpd256_pd512(__m256d a)
{		{
// CHECK-LABEL: @test_mm512_castpd256_pd512		// CHECK-LABEL: @test_mm512_castpd256_pd512
// CHECK: shufflevector <4 x double> {{.*}} <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		// CHECK: [[A:%.*]] = freeze <4 x double> poison
		// CHECK: shufflevector <4 x double> {{.*}}, <4 x double> [[A]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
return _mm512_castpd256_pd512(a);		return _mm512_castpd256_pd512(a);
}		}

__m256d test_mm512_castpd512_pd256 (__m512d __A)		__m256d test_mm512_castpd512_pd256 (__m512d __A)
{		{
// CHECK-LABEL: @test_mm512_castpd512_pd256		// CHECK-LABEL: @test_mm512_castpd512_pd256
// CHECK: shufflevector <8 x double> %{{.}}, <8 x double> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		// CHECK: shufflevector <8 x double> %{{.}}, <8 x double> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
return _mm512_castpd512_pd256 (__A);		return _mm512_castpd512_pd256 (__A);
Show All 9 Lines
__m512i test_mm512_castps_si512 (__m512 __A)		__m512i test_mm512_castps_si512 (__m512 __A)
{		{
// CHECK-LABEL: @test_mm512_castps_si512		// CHECK-LABEL: @test_mm512_castps_si512
// CHECK: bitcast <16 x float> %{{.}} to <8 x i64>		// CHECK: bitcast <16 x float> %{{.}} to <8 x i64>
return _mm512_castps_si512 (__A);		return _mm512_castps_si512 (__A);
}		}
__m512i test_mm512_castsi128_si512(__m128i __A) {		__m512i test_mm512_castsi128_si512(__m128i __A) {
// CHECK-LABEL: @test_mm512_castsi128_si512		// CHECK-LABEL: @test_mm512_castsi128_si512
// CHECK: shufflevector <2 x i64> %{{.}}, <2 x i64> %{{.}}, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		// CHECK: [[A:%.*]] = freeze <2 x i64> poison
		// CHECK: shufflevector <2 x i64> %{{.*}}, <2 x i64> [[A]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 2, i32 3, i32 2, i32 3>
return _mm512_castsi128_si512(__A);		return _mm512_castsi128_si512(__A);
}		}

__m512i test_mm512_castsi256_si512(__m256i __A) {		__m512i test_mm512_castsi256_si512(__m256i __A) {
// CHECK-LABEL: @test_mm512_castsi256_si512		// CHECK-LABEL: @test_mm512_castsi256_si512
// CHECK: shufflevector <4 x i64> %{{.}}, <4 x i64> %{{.}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		// CHECK: [[A:%.*]] = freeze <4 x i64> poison
		// CHECK: shufflevector <4 x i64> %{{.*}}, <4 x i64> [[A]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
return _mm512_castsi256_si512(__A);		return _mm512_castsi256_si512(__A);
}		}

__m512 test_mm512_castsi512_ps (__m512i __A)		__m512 test_mm512_castsi512_ps (__m512i __A)
{		{
// CHECK-LABEL: @test_mm512_castsi512_ps		// CHECK-LABEL: @test_mm512_castsi512_ps
// CHECK: bitcast <8 x i64> %{{.}} to <16 x float>		// CHECK: bitcast <8 x i64> %{{.}} to <16 x float>
return _mm512_castsi512_ps (__A);		return _mm512_castsi512_ps (__A);
▲ Show 20 Lines • Show All 1,734 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx512fp16-builtins.c

	Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines
	__m256h test_mm512_castph512_ph256(__m512h __a) {			__m256h test_mm512_castph512_ph256(__m512h __a) {
	// CHECK-LABEL: test_mm512_castph512_ph256			// CHECK-LABEL: test_mm512_castph512_ph256
	// CHECK: shufflevector <32 x half> %{{.}}, <32 x half> %{{.}}, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: shufflevector <32 x half> %{{.}}, <32 x half> %{{.}}, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	return _mm512_castph512_ph256(__a);			return _mm512_castph512_ph256(__a);
	}			}

	__m256h test_mm256_castph128_ph256(__m128h __a) {			__m256h test_mm256_castph128_ph256(__m128h __a) {
	// CHECK-LABEL: test_mm256_castph128_ph256			// CHECK-LABEL: test_mm256_castph128_ph256
	// CHECK: shufflevector <8 x half> %{{.}}, <8 x half> %{{.}}, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			// CHECK: [[A:%.*]] = freeze <8 x half> poison
				// CHECK: shufflevector <8 x half> %{{.*}}, <8 x half> [[A]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	return _mm256_castph128_ph256(__a);			return _mm256_castph128_ph256(__a);
	}			}

	__m512h test_mm512_castph128_ph512(__m128h __a) {			__m512h test_mm512_castph128_ph512(__m128h __a) {
	// CHECK-LABEL: test_mm512_castph128_ph512			// CHECK-LABEL: test_mm512_castph128_ph512
	// CHECK: shufflevector <8 x half> %{{.}}, <8 x half> %{{.}}, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			// CHECK: [[A:%.*]] = freeze <8 x half> poison
				// CHECK: shufflevector <8 x half> %{{.*}}, <8 x half> [[A]], <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	return _mm512_castph128_ph512(__a);			return _mm512_castph128_ph512(__a);
	}			}

	__m512h test_mm512_castph256_ph512(__m256h __a) {			__m512h test_mm512_castph256_ph512(__m256h __a) {
	// CHECK-LABEL: test_mm512_castph256_ph512			// CHECK-LABEL: test_mm512_castph256_ph512
	// CHECK: shufflevector <16 x half> %{{.}}, <16 x half> %{{.}}, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			// CHECK: [[A:%.*]] = freeze <16 x half> poison
				// CHECK: shufflevector <16 x half> %{{.*}}, <16 x half> [[A]], <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	return _mm512_castph256_ph512(__a);			return _mm512_castph256_ph512(__a);
	}			}

	__m256h test_mm256_zextph128_ph256(__m128h __a) {			__m256h test_mm256_zextph128_ph256(__m128h __a) {
	// CHECK-LABEL: test_mm256_zextph128_ph256			// CHECK-LABEL: test_mm256_zextph128_ph256
	// CHECK: shufflevector <8 x half> %{{.}}, <8 x half> {{.}}, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			// CHECK: shufflevector <8 x half> %{{.}}, <8 x half> {{.}}, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	return _mm256_zextph128_ph256(__a);			return _mm256_zextph128_ph256(__a);
	}			}
	▲ Show 20 Lines • Show All 4,279 Lines • Show Last 20 Lines