This is an archive of the discontinued LLVM Phabricator instance.

The reason for this change is that there are some conditions under which these vec_splat functions get inlined into other code and the information about the calls to the vperm intrinsics actually being splats is somehow lost. The calls to __builtin_shufflevector will ensure that the information about the shuffle is retained. The benchmark that identified this issue sees significant improvement with this patch.

Thanks for using target-independent builtin and IR.

lib/Headers/altivec.h
7807–7808	Why do we need a switch case here (and in the ones below) instead of just returning builtin_shufflevector(a, a, elem, elem, elem, __elem)?

amehsan added inline comments.Apr 1 2016, 8:53 AM

lib/Headers/altivec.h
7807–7808	Apparently phabricator has a problem with underlines....

nemanjai added inline comments.Apr 1 2016, 9:57 AM

lib/Headers/altivec.h
7807–7808	The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat functions will typically be called with compile-time constants anyway so at anything other than -O0, the unnecessary branches will be eliminated. And if they're called with non-constant arguments for the element index, the sequence is probably still better than building a mask vector and using vperm. Basically, at -O2 this form when the argument is variable will have a shift, compare, branch and xxspltw. The previous form has 16 stores (of each byte), a vector load (which includes a swap on LE) and a vperm.
7807–7808	Yeah, when entering code in phabricator comments, you can do one of two things: Put them in a code block: __builtin_shufflevector(__a, __a, 3, 3, 3, 3); Use the monospaced tag around the text: `__builtin_shufflevector(__a, __a, 3, 3, 3, 3);`

FWIW LGTM ;-)

One question: xxspltd is an extended mnemonic for xxpermdi. In your current patch for BE I do not see adding a pattern for xxpermdi. Did you add some code to that patch that is not yet posted here? Or something else is going on here?

I have added a minor nit as well.

test/CodeGen/ppc-vsx-splat.c
67	b is unused
75	same here

amehsan accepted this revision.Apr 1 2016, 10:48 AM

amehsan edited edge metadata.

This revision is now accepted and ready to land.Apr 1 2016, 10:48 AM

never mind the question

On Friday I looked at this a little bit in rush. I looked into this more carefully again (including checking what IR we generate after the change for the testcase added). I think the implementation could be improved by using a built-in that accepts variable parameters (either an existing builtin or a new one). We do need to generate shufflevector instruction which requires constant parameters, but that can be done in CGBuiltin.cpp by calling Builder.CreateShuffleVector API. (There are multiple uses of this in CGBuiltin.cpp)

It is true that current implementation is a significant improvement over existing code, but that should not prevent us from generating better code when that is possible. (Unless the change is too complex or requires significant amount of extra work. But that is not the case here).

Also I grepped for existing uses of __builtin_shufflevector in the header files. I didn't check every single use, but in the fairly large number that I checked I did not find any use within a conditional statement such as switch case.

nemanjai marked 2 inline comments as done.Apr 3 2016, 12:36 PM

nemanjai added inline comments.

test/CodeGen/ppc-vsx-splat.c
67	Oops, thank you. I'll remove both.

Ignore my last comment. That approach is not going to work.

nemanjai added a reviewer: echristo.Apr 4 2016, 8:30 AM

It appears @amehsan and I have come to an agreement here. Does anyone else have any objections to this approach @hfinkel, @echristo, @kbarton.
Sorry for the spam from the pings, I just want to make sure that we all agree on this approach for vector shuffles emitted from altivec.h as we might want do do something like this for other functions. This particular fix provided a significant improvement in Eigen and it is likely that there are other places where the compiler is tricked into not seeing through vperm's to see that they're vector shuffles.

@nemanjai I was just thinking of something else :)

If we have a runtime value for the second parameter, do we know that this code is an improvement over the previous implementation? (Specially if values of the runtime parameter has more or less equal frequency which code result in a lot of mispredicted branches.)

Something else to consider: (I have not yet checked it). If we specialize vec_perm implementation for vec_splat with runtime values, can we improve it? If yes, how would that compare to the current implementation.

In D18593#391192, @amehsan wrote:

@nemanjai I was just thinking of something else :)

If we have a runtime value for the second parameter, do we know that this code is an improvement over the previous implementation? (Specially if values of the runtime parameter has more or less equal frequency which code result in a lot of mispredicted branches.)

Something else to consider: (I have not yet checked it). If we specialize vec_perm implementation for vec_splat with runtime values, can we improve it? If yes, how would that compare to the current implementation.

I don't see any cases in which the old implementation leads to better code than what this patch suggests. I'm focusing on LE here.
Old implementation (noopt - both const and non-const index): 16 stb's, a mess of both vector and scalar ops, vector loads and stores, etc.
Old implementation (-O1 - const index): vspltw (plus we lose the information about the shuffle in some cases - which prompted this patch)
Old implementation (-O1 - non-const index): vspltisb, some rlwimi's, 16 stb's, lxvd2x along with the requisite swap, xxlor and vperm (the bulk of the work is for building the mask)

This patch (noopt - both const and non-const index): cmplwi, bgt, mtctr, bctr, xxspltw (along with the necessary swaps of the vectors and storing the arguments)
This patch (-O1 - const index): xxspltw (and we always retain the information about this being a shufflevector since we are using that builtin)
This patch (-O1 - non-const index): rlwinm, worst case - 3 cmplwi and 3 beq/bne

Although the noopt code emitted even with this patch has some SPR operations that are likely expensive, I don't think it is ever worse than 16 stores and a load (along with other instructions necessary to create a mask vector). And I am not sure how much merit there is in discussing performance characteristics of code generated at noopt.

kbarton requested changes to this revision.Apr 4 2016, 10:51 AM

kbarton edited edge metadata.

kbarton added inline comments.

lib/Headers/altivec.h
7807–7808	Could we add a test case (or extend an existing test case) to ensure that these extra branches are cleaned up at -O2 and above? The branches will hinder instruction scheduling (among other things) so we should ensure they are cleaned up. I agree the extra branches at -O0 are not a concern.

This revision now requires changes to proceed.Apr 4 2016, 10:51 AM

I'll get to the rest of it, but for this Kit's question about O2: It's
preferred to test such things in the back end.

msg-11379-258.txt162 BDownload

Nemanja

Thanks for the comparison. This is what I have been thinking about. But if others prefer switch-case implementation, that is fine with me as well. I am not sure this approach is better.

If we have a built-in that accepts variable operands, in CGBuitlin.cpp we can check the operand. If we see a constant value we can emit shufflevector. Otherwise : there are only 4 different masks needed for vec_splat (of vector int or vector unsigned int). We can add a global array that includes all these 4 masks. Then we can load proper mask (using the parameter passed to vec_splat to index the array). Now once we have the maskwe probably need to generate an intrinsic (Is there an IR instruction that we can use?). So this approach has its own disadvantages too....

nemanjai added inline comments.Apr 4 2016, 11:21 AM

test/CodeGen/ppc-vsx-splat.c
39	@kbarton @echristo These tests check that we definitely have the right instruction, but they do not test that we do not have the branches. I can add some CHECK-NOT's or a check for the blr at the end of the function. Or I can add a test case to the back end portion (i.e. D18592).

In D18593#391361, @amehsan wrote:

Nemanja

Thanks for the comparison. This is what I have been thinking about. But if others prefer switch-case implementation, that is fine with me as well. I am not sure this approach is better.

If we have a built-in that accepts variable operands, in CGBuitlin.cpp we can check the operand. If we see a constant value we can emit shufflevector. Otherwise : there are only 4 different masks needed for vec_splat (of vector int or vector unsigned int). We can add a global array that includes all these 4 masks. Then we can load proper mask (using the parameter passed to vec_splat to index the array). Now once we have the maskwe probably need to generate an intrinsic (Is there an IR instruction that we can use?). So this approach has its own disadvantages too....

I suppose we can index into PerfectShuffleTable or something similar (presumably it has entries for splats). But then we're stuck with VMX instructions rather than VSX. So I'm not sure that this is a win even in that case.

echristo added inline comments.Apr 4 2016, 2:48 PM

lib/Headers/altivec.h
7807–7808	Another common workaround for these is to make the definitions macros and then you can use the mask as part of the macro arguments.

hfinkel added inline comments.Apr 4 2016, 4:19 PM

lib/Headers/altivec.h
7807–7808	I agree; a macro can be used here. Another option is to use __builtin_constant_p to choose between the switch statement (in the case where we know it gets optimized away), and a target-specific intrinsic for the cases we can't generically represent.

amehsan added inline comments.Apr 4 2016, 9:25 PM

lib/Headers/altivec.h
7807–7808	__builtin_constant_p is really interesting. Much simpler implementation, no need to touch CGBuiltin.cpp. Thanks Hal for mentioning it.

The underlying cause for the poor code we were getting is better resolved with
http://reviews.llvm.org/D20443

Revision Contents

Path

Size

include/

clang/

Basic/

BuiltinsPPC.def

3 lines

lib/

Headers/

altivec.h

134 lines

test/

CodeGen/

builtins-ppc-vsx.c

36 lines

ppc-vsx-splat.c

100 lines

Diff 52039

include/clang/Basic/BuiltinsPPC.def

	Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_vsx_xxleqv, "V4UiV4UiV4Ui", "")			BUILTIN(__builtin_vsx_xxleqv, "V4UiV4UiV4Ui", "")

	BUILTIN(__builtin_vsx_xvcpsgndp, "V2dV2dV2d", "")			BUILTIN(__builtin_vsx_xvcpsgndp, "V2dV2dV2d", "")
	BUILTIN(__builtin_vsx_xvcpsgnsp, "V4fV4fV4f", "")			BUILTIN(__builtin_vsx_xvcpsgnsp, "V4fV4fV4f", "")

	BUILTIN(__builtin_vsx_xvabssp, "V4fV4f", "")			BUILTIN(__builtin_vsx_xvabssp, "V4fV4f", "")
	BUILTIN(__builtin_vsx_xvabsdp, "V2dV2d", "")			BUILTIN(__builtin_vsx_xvabsdp, "V2dV2d", "")

				BUILTIN(__builtin_vsx_xxspltw, "V4iV4iUIi", "")
				BUILTIN(__builtin_vsx_xxpermdi, "V2ULLiV2ULLiV2ULLiUIi", "")

	// HTM builtins			// HTM builtins
	BUILTIN(__builtin_tbegin, "UiUIi", "")			BUILTIN(__builtin_tbegin, "UiUIi", "")
	BUILTIN(__builtin_tend, "UiUIi", "")			BUILTIN(__builtin_tend, "UiUIi", "")

	BUILTIN(__builtin_tabort, "UiUi", "")			BUILTIN(__builtin_tabort, "UiUi", "")
	BUILTIN(__builtin_tabortdc, "UiUiUiUi", "")			BUILTIN(__builtin_tabortdc, "UiUiUiUi", "")
	BUILTIN(__builtin_tabortdci, "UiUiUii", "")			BUILTIN(__builtin_tabortdci, "UiUiUii", "")
	BUILTIN(__builtin_tabortwc, "UiUiUiUi", "")			BUILTIN(__builtin_tabortwc, "UiUiUiUi", "")
	Show All 33 Lines

lib/Headers/altivec.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,715 Lines • ▼ Show 20 Lines	static __inline__ vector pixel __ATTRS_o_ai vec_splat(vector pixel __a,
unsigned const int __b) {		unsigned const int __b) {
unsigned char b0 = (__b & 0x07) * 2;		unsigned char b0 = (__b & 0x07) * 2;
unsigned char b1 = b0 + 1;		unsigned char b1 = b0 + 1;
return vec_perm(__a, __a,		return vec_perm(__a, __a,
(vector unsigned char)(b0, b1, b0, b1, b0, b1, b0, b1, b0, b1,		(vector unsigned char)(b0, b1, b0, b1, b0, b1, b0, b1, b0, b1,
b0, b1, b0, b1, b0, b1));		b0, b1, b0, b1, b0, b1));
}		}

		#ifdef __VSX__
		static __inline__ vector signed int __ATTRS_o_ai
		vec_splat(vector signed int __a, unsigned const int __b) {
		#ifdef __LITTLE_ENDIAN__
		unsigned __elem = 3 - (__b & 0x3);
		#else
		unsigned __elem = __b & 0x3;
		#endif
		switch(__elem) {
		case 0:
		return __builtin_vsx_xxspltw(__a, 0);
		case 1:
		return __builtin_vsx_xxspltw(__a, 1);
		case 2:
		return __builtin_vsx_xxspltw(__a, 2);
		case 3:
		return __builtin_vsx_xxspltw(__a, 3);
		}
		}

		static __inline__ vector unsigned int __ATTRS_o_ai
		vec_splat(vector unsigned int __a, unsigned const int __b) {
		#ifdef __LITTLE_ENDIAN__
		unsigned __elem = 3 - (__b & 0x3);
		#else
		unsigned __elem = __b & 0x3;
		#endif
		switch(__elem) {
		case 0:
		return __builtin_vsx_xxspltw(__a, 0);
		case 1:
		return __builtin_vsx_xxspltw(__a, 1);
		case 2:
		return __builtin_vsx_xxspltw(__a, 2);
		case 3:
		return __builtin_vsx_xxspltw(__a, 3);
		}
		}

		static __inline__ vector bool int __ATTRS_o_ai
		vec_splat(vector bool int __a, unsigned const int __b) {
		#ifdef __LITTLE_ENDIAN__
		unsigned __elem = 3 - (__b & 0x3);
		#else
		unsigned __elem = __b & 0x3;
		#endif
		switch(__elem) {
		case 0:
		return __builtin_vsx_xxspltw(__a, 0);
		case 1:
		return __builtin_vsx_xxspltw(__a, 1);
		case 2:
		return __builtin_vsx_xxspltw(__a, 2);
		case 3:
		return __builtin_vsx_xxspltw(__a, 3);
		}
		}

		static __inline__ vector float __ATTRS_o_ai vec_splat(vector float __a,
		unsigned const int __b) {
		#ifdef __LITTLE_ENDIAN__
		unsigned __elem = 3 - (__b & 0x3);
		#else
		unsigned __elem = __b & 0x3;
		#endif
		switch(__elem) {
		case 0:
		return __builtin_vsx_xxspltw(__a, 0);
		case 1:
		return __builtin_vsx_xxspltw(__a, 1);
		case 2:
		return __builtin_vsx_xxspltw(__a, 2);
		case 3:
		return __builtin_vsx_xxspltw(__a, 3);
		}
		}
		#else
static __inline__ vector signed int __ATTRS_o_ai		static __inline__ vector signed int __ATTRS_o_ai
vec_splat(vector signed int __a, unsigned const int __b) {		vec_splat(vector signed int __a, unsigned const int __b) {
unsigned char b0 = (__b & 0x03) * 4;		unsigned char b0 = (__b & 0x03) * 4;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;		unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;
return vec_perm(__a, __a,		return vec_perm(__a, __a,
(vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,		(vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,
b2, b3, b0, b1, b2, b3));		b2, b3, b0, b1, b2, b3));
}		}
		amehsanUnsubmitted Done Reply Inline Actions Why do we need a switch case here (and in the ones below) instead of just returning builtin_shufflevector(a, a, elem, elem, elem, __elem)? amehsan: Why do we need a switch case here (and in the ones below) instead of just returning…
		amehsanUnsubmitted Done Reply Inline Actions Apparently phabricator has a problem with underlines.... amehsan: Apparently phabricator has a problem with underlines....
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Yeah, when entering code in phabricator comments, you can do one of two things: Put them in a code block: __builtin_shufflevector(__a, __a, 3, 3, 3, 3); Use the monospaced tag around the text: `__builtin_shufflevector(__a, __a, 3, 3, 3, 3);` nemanjai: Yeah, when entering code in phabricator comments, you can do one of two things: 1. Put them in…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat functions will typically be called with compile-time constants anyway so at anything other than -O0, the unnecessary branches will be eliminated. And if they're called with non-constant arguments for the element index, the sequence is probably still better than building a mask vector and using vperm. Basically, at -O2 this form when the argument is variable will have a shift, compare, branch and xxspltw. The previous form has 16 stores (of each byte), a vector load (which includes a swap on LE) and a vperm. nemanjai: The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat…
		kbartonUnsubmitted Not Done Reply Inline Actions Could we add a test case (or extend an existing test case) to ensure that these extra branches are cleaned up at -O2 and above? The branches will hinder instruction scheduling (among other things) so we should ensure they are cleaned up. I agree the extra branches at -O0 are not a concern. kbarton: Could we add a test case (or extend an existing test case) to ensure that these extra branches…
		echristoUnsubmitted Not Done Reply Inline Actions Another common workaround for these is to make the definitions macros and then you can use the mask as part of the macro arguments. echristo: Another common workaround for these is to make the definitions macros and then you can use the…
		hfinkelUnsubmitted Not Done Reply Inline Actions I agree; a macro can be used here. Another option is to use __builtin_constant_p to choose between the switch statement (in the case where we know it gets optimized away), and a target-specific intrinsic for the cases we can't generically represent. hfinkel: I agree; a macro can be used here. Another option is to use __builtin_constant_p to choose…
		amehsanUnsubmitted Not Done Reply Inline Actions __builtin_constant_p is really interesting. Much simpler implementation, no need to touch CGBuiltin.cpp. Thanks Hal for mentioning it. amehsan: __builtin_constant_p is really interesting. Much simpler implementation, no need to touch…

static __inline__ vector unsigned int __ATTRS_o_ai		static __inline__ vector unsigned int __ATTRS_o_ai
vec_splat(vector unsigned int __a, unsigned const int __b) {		vec_splat(vector unsigned int __a, unsigned const int __b) {
unsigned char b0 = (__b & 0x03) * 4;		unsigned char b0 = (__b & 0x03) * 4;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;		unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;
return vec_perm(__a, __a,		return vec_perm(__a, __a,
(vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,		(vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,
b2, b3, b0, b1, b2, b3));		b2, b3, b0, b1, b2, b3));
Show All 11 Lines
static __inline__ vector float __ATTRS_o_ai vec_splat(vector float __a,		static __inline__ vector float __ATTRS_o_ai vec_splat(vector float __a,
unsigned const int __b) {		unsigned const int __b) {
unsigned char b0 = (__b & 0x03) * 4;		unsigned char b0 = (__b & 0x03) * 4;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;		unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;
return vec_perm(__a, __a,		return vec_perm(__a, __a,
(vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,		(vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,
b2, b3, b0, b1, b2, b3));		b2, b3, b0, b1, b2, b3));
}		}
		#endif

#ifdef __VSX__		#ifdef __VSX__
static __inline__ vector double __ATTRS_o_ai vec_splat(vector double __a,		static __inline__ vector double __ATTRS_o_ai vec_splat(vector double __a,
unsigned const int __b) {		unsigned const int __b) {
unsigned char b0 = (__b & 0x01) * 8;		unsigned __elem = __b & 1;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3, b4 = b0 + 4, b5 = b0 + 5,		#ifdef __LITTLE_ENDIAN__
b6 = b0 + 6, b7 = b0 + 7;		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 0) :
return vec_perm(__a, __a,		__builtin_vsx_xxpermdi(__a, __a, 3);
(vector unsigned char)(b0, b1, b2, b3, b4, b5, b6, b7, b0, b1,		#else
b2, b3, b4, b5, b6, b7));		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 3) :
		__builtin_vsx_xxpermdi(__a, __a, 0);
		#endif
}		}
static __inline__ vector bool long long __ATTRS_o_ai		static __inline__ vector bool long long __ATTRS_o_ai
vec_splat(vector bool long long __a, unsigned const int __b) {		vec_splat(vector bool long long __a, unsigned const int __b) {
unsigned char b0 = (__b & 0x01) * 8;		unsigned __elem = __b & 1;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3, b4 = b0 + 4, b5 = b0 + 5,		#ifdef __LITTLE_ENDIAN__
b6 = b0 + 6, b7 = b0 + 7;		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 0) :
return vec_perm(__a, __a,		__builtin_vsx_xxpermdi(__a, __a, 3);
(vector unsigned char)(b0, b1, b2, b3, b4, b5, b6, b7, b0, b1,		#else
b2, b3, b4, b5, b6, b7));		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 3) :
		__builtin_vsx_xxpermdi(__a, __a, 0);
		#endif
}		}
static __inline__ vector signed long long __ATTRS_o_ai		static __inline__ vector signed long long __ATTRS_o_ai
vec_splat(vector signed long long __a, unsigned const int __b) {		vec_splat(vector signed long long __a, unsigned const int __b) {
unsigned char b0 = (__b & 0x01) * 8;		unsigned __elem = __b & 1;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3, b4 = b0 + 4, b5 = b0 + 5,		#ifdef __LITTLE_ENDIAN__
b6 = b0 + 6, b7 = b0 + 7;		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 0) :
return vec_perm(__a, __a,		__builtin_vsx_xxpermdi(__a, __a, 3);
(vector unsigned char)(b0, b1, b2, b3, b4, b5, b6, b7, b0, b1,		#else
b2, b3, b4, b5, b6, b7));		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 3) :
		__builtin_vsx_xxpermdi(__a, __a, 0);
		#endif
}		}
static __inline__ vector unsigned long long __ATTRS_o_ai		static __inline__ vector unsigned long long __ATTRS_o_ai
vec_splat(vector unsigned long long __a, unsigned const int __b) {		vec_splat(vector unsigned long long __a, unsigned const int __b) {
unsigned char b0 = (__b & 0x01) * 8;		unsigned __elem = __b & 1;
unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3, b4 = b0 + 4, b5 = b0 + 5,		#ifdef __LITTLE_ENDIAN__
b6 = b0 + 6, b7 = b0 + 7;		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 0) :
return vec_perm(__a, __a,		__builtin_vsx_xxpermdi(__a, __a, 3);
(vector unsigned char)(b0, b1, b2, b3, b4, b5, b6, b7, b0, b1,		#else
b2, b3, b4, b5, b6, b7));		return __elem ? __builtin_vsx_xxpermdi(__a, __a, 3) :
		__builtin_vsx_xxpermdi(__a, __a, 0);
		#endif
}		}
#endif		#endif

/* vec_vspltb */		/* vec_vspltb */

#define __builtin_altivec_vspltb vec_vspltb		#define __builtin_altivec_vspltb vec_vspltb

static __inline__ vector signed char __ATTRS_o_ai		static __inline__ vector signed char __ATTRS_o_ai
▲ Show 20 Lines • Show All 6,594 Lines • Show Last 20 Lines

test/CodeGen/builtins-ppc-vsx.c

	Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	// CHECK: call <2 x double> @llvm.round.v2f64(<2 x double>			// CHECK: call <2 x double> @llvm.round.v2f64(<2 x double>
	// CHECK-LE: call <2 x double> @llvm.round.v2f64(<2 x double>			// CHECK-LE: call <2 x double> @llvm.round.v2f64(<2 x double>

	res_vd = vec_perm(vd, vd, vuc);			res_vd = vec_perm(vd, vd, vuc);
	// CHECK: @llvm.ppc.altivec.vperm			// CHECK: @llvm.ppc.altivec.vperm
	// CHECK-LE: @llvm.ppc.altivec.vperm			// CHECK-LE: @llvm.ppc.altivec.vperm

	res_vd = vec_splat(vd, 1);			res_vd = vec_splat(vd, 1);
	// CHECK: [[T1:%.+]] = bitcast <2 x double> {{.+}} to <4 x i32>			// CHECK: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 3)
	// CHECK: [[T2:%.+]] = bitcast <2 x double> {{.+}} to <4 x i32>			// CHECK-LE: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 0)
	// CHECK: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>
	// CHECK-LE: xor <16 x i8>
	// CHECK-LE: [[T1:%.+]] = bitcast <2 x double> {{.+}} to <4 x i32>
	// CHECK-LE: [[T2:%.+]] = bitcast <2 x double> {{.+}} to <4 x i32>
	// CHECK-LE: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>

	res_vbll = vec_splat(vbll, 1);			res_vbll = vec_splat(vbll, 1);
	// CHECK: [[T1:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>			// CHECK: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 3)
	// CHECK: [[T2:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>			// CHECK-LE: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 0)
	// CHECK: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>
	// CHECK-LE: xor <16 x i8>
	// CHECK-LE: [[T1:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>
	// CHECK-LE: [[T2:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>
	// CHECK-LE: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>

	res_vsll = vec_splat(vsll, 1);			res_vsll = vec_splat(vsll, 1);
	// CHECK: [[T1:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>			// CHECK: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 3)
	// CHECK: [[T2:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>			// CHECK-LE: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 0)
	// CHECK: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>
	// CHECK-LE: xor <16 x i8>
	// CHECK-LE: [[T1:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>
	// CHECK-LE: [[T2:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>
	// CHECK-LE: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>

	res_vull = vec_splat(vull, 1);			res_vull = vec_splat(vull, 1);
	// CHECK: [[T1:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>			// CHECK: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 3)
	// CHECK: [[T2:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>			// CHECK-LE: @llvm.ppc.vsx.xxpermdi(<2 x i64> {{%.+}}, <2 x i64> {{%.+}}, i32 0)
	// CHECK: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>
	// CHECK-LE: xor <16 x i8>
	// CHECK-LE: [[T1:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>
	// CHECK-LE: [[T2:%.+]] = bitcast <2 x i64> {{.+}} to <4 x i32>
	// CHECK-LE: call <4 x i32> @llvm.ppc.altivec.vperm(<4 x i32> [[T1]], <4 x i32> [[T2]], <16 x i8>

	res_vsi = vec_pack(vsll, vsll);			res_vsi = vec_pack(vsll, vsll);
	// CHECK: @llvm.ppc.altivec.vperm			// CHECK: @llvm.ppc.altivec.vperm
	// CHECK-LE: @llvm.ppc.altivec.vperm			// CHECK-LE: @llvm.ppc.altivec.vperm

	res_vui = vec_pack(vull, vull);			res_vui = vec_pack(vull, vull);
	// CHECK: @llvm.ppc.altivec.vperm			// CHECK: @llvm.ppc.altivec.vperm
	// CHECK-LE: @llvm.ppc.altivec.vperm			// CHECK-LE: @llvm.ppc.altivec.vperm
	▲ Show 20 Lines • Show All 716 Lines • Show Last 20 Lines

test/CodeGen/ppc-vsx-splat.c

				// REQUIRES: powerpc-registered-target
				// RUN: %clang_cc1 -target-feature +vsx -target-feature +altivec -triple \
				// RUN: powerpc64-unknown-unknown -S -faltivec %s -o - \| FileCheck %s \
				// RUN: --check-prefix=CHECK-NOOPT

				// RUN: %clang_cc1 -target-feature +vsx -target-feature +altivec -triple \
				// RUN: powerpc64le-unknown-unknown -S -faltivec %s -o - \| FileCheck %s \
				// RUN: -check-prefix=CHECK-NOOPT

				// RUN: %clang_cc1 -target-feature +vsx -target-feature +altivec -triple \
				// RUN: powerpc64-unknown-unknown -O2 -S -faltivec %s -o - \| FileCheck %s

				// RUN: %clang_cc1 -target-feature +vsx -target-feature +altivec -triple \
				// RUN: powerpc64le-unknown-unknown -O2 -S -faltivec %s -o - \| FileCheck %s \
				// RUN: -check-prefix=CHECK-LE

				#include <altivec.h>
				vector signed int spltwv(vector signed int a, unsigned b) {
				return vec_splat(a, b);
				// CHECK-LABEL: spltwv
				// CHECK-LE-LABEL: spltwv
				// CHECK-NOOPT-LABEL: spltwv
				// CHECK-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 1
				// CHECK-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 2
				// CHECK-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 3
				// CHECK-LE-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-LE-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 1
				// CHECK-LE-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 2
				// CHECK-LE-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 3
				// CHECK-NOOPT-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-NOOPT-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 1
				// CHECK-NOOPT-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 2
				// CHECK-NOOPT-DAG: xxspltw {{[0-9]+}}, {{[0-9]+}}, 3
				}

				vector signed long long spltdv(vector signed long long a, unsigned b) {
				return vec_splat(a, b);
				// CHECK-LABEL: spltdv
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions @kbarton @echristo These tests check that we definitely have the right instruction, but they do not test that we do not have the branches. I can add some CHECK-NOT's or a check for the blr at the end of the function. Or I can add a test case to the back end portion (i.e. D18592). nemanjai: @kbarton @echristo These tests check that we definitely have the right instruction, but they do…
				// CHECK-LE-LABEL: spltdv
				// CHECK-NOOPT-LABEL: spltdv
				// CHECK-DAG: xxspltd {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-DAG: xxspltd {{[0-9]+}}, {{[0-9]+}}, 1
				// CHECK-LE-DAG: xxspltd {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-LE-DAG: xxspltd {{[0-9]+}}, {{[0-9]+}}, 1
				}

				vector signed int spltw0(vector signed int a) {
				return vec_splat(a, 0);
				// CHECK-LABEL: spltw0
				// CHECK-LE-LABEL: spltw0
				// CHECK-NOOPT-LABEL: spltw0
				// CHECK: xxspltw {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-LE: xxspltw {{[0-9]+}}, {{[0-9]+}}, 3
				}

				vector signed int spltw1(vector signed int a) {
				return vec_splat(a, 1);
				// CHECK-LABEL: spltw1
				// CHECK-LE-LABEL: spltw1
				// CHECK-NOOPT-LABEL: spltw1
				// CHECK: xxspltw {{[0-9]+}}, {{[0-9]+}}, 1
				// CHECK-LE: xxspltw {{[0-9]+}}, {{[0-9]+}}, 2
				}

				vector signed int spltw2(vector signed int a) {
				return vec_splat(a, 2);
				amehsanUnsubmitted Not Done Reply Inline Actions b is unused amehsan: b is unused
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Oops, thank you. I'll remove both. nemanjai: Oops, thank you. I'll remove both.
				// CHECK-LABEL: spltw2
				// CHECK-LE-LABEL: spltw2
				// CHECK-NOOPT-LABEL: spltw2
				// CHECK: xxspltw {{[0-9]+}}, {{[0-9]+}}, 2
				// CHECK-LE: xxspltw {{[0-9]+}}, {{[0-9]+}}, 1
				}

				vector signed int spltw3(vector signed int a) {
				amehsanUnsubmitted Not Done Reply Inline Actions same here amehsan: same here
				return vec_splat(a, 3);
				// CHECK-LABEL: spltw3
				// CHECK-LE-LABEL: spltw3
				// CHECK-NOOPT-LABEL: spltw3
				// CHECK: xxspltw {{[0-9]+}}, {{[0-9]+}}, 3
				// CHECK-LE: xxspltw {{[0-9]+}}, {{[0-9]+}}, 0
				}

				vector signed long long spltd0(vector signed long long a, unsigned b) {
				return vec_splat(a, 0);
				// CHECK-LABEL: spltd0
				// CHECK-LE-LABEL: spltd0
				// CHECK-NOOPT-LABEL: spltd0
				// CHECK: xxspltd {{[0-9]+}}, {{[0-9]+}}, 0
				// CHECK-LE: xxspltd {{[0-9]+}}, {{[0-9]+}}, 1
				}

				vector signed long long spltd1(vector signed long long a, unsigned b) {
				return vec_splat(a, 1);
				// CHECK-LABEL: spltd1
				// CHECK-LE-LABEL: spltd1
				// CHECK-NOOPT-LABEL: spltd1
				// CHECK: xxspltd {{[0-9]+}}, {{[0-9]+}}, 1
				// CHECK-LE: xxspltd {{[0-9]+}}, {{[0-9]+}}, 0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Front end improvements for vec_splatAbandonedPublic

Details

Diff Detail