This is an archive of the discontinued LLVM Phabricator instance.

[X86] Replace (v)palignr intrinsics with generic shuffles (Clang)
AbandonedPublic

Authored by RKSimon on Mar 12 2015, 10:48 AM.

Download Raw Diff

Details

Reviewers

spatel
chandlerc
andreadb
craig.topper

Summary

The (v)palignr instructions are currently described using builtin intrinsics although the x86 shuffle lowering code now correctly identifies them.

This patch replaces the builtins with generic __builtin_shufflevector calls. I'll be posting a LLVM equivalent patch shortly.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 21851.Mar 12 2015, 10:48 AM

RKSimon retitled this revision from to [X86] Replace (v)palignr intrinsics with generic shuffles (Clang).

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: craig.topper, andreadb, spatel, chandlerc.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

RKSimon mentioned this in D8302: [X86] Replace (v)palignr intrinsics with generic shuffles (LLVM).Mar 12 2015, 10:50 AM

We've always been sending shuffles to the backend. We just generated the shuffles in CGBuiltin instead of the header.

I'm not sure I like completely losing the type system on the immediate. Theoretically with the code in CGBuiltin we could at least get a truncation warning if the immediate was larger than a byte. Though I'm not sure that warning is on by default. Really I wish we could check the immediates for illegal values on all of these macros and deliver nice messages to the user. I think gcc does check a lot of them.

In D8301#140042, @craig.topper wrote:

We've always been sending shuffles to the backend. We just generated the shuffles in CGBuiltin instead of the header.

Hi Craig, yes, I was hoping that this patch would get us to the point that we could get rid of even that - or is the CGBuiltin stage good enough do you think?

I'm not sure I like completely losing the type system on the immediate. Theoretically with the code in CGBuiltin we could at least get a truncation warning if the immediate was larger than a byte. Though I'm not sure that warning is on by default. Really I wish we could check the immediates for illegal values on all of these macros and deliver nice messages to the user. I think gcc does check a lot of them.

Short of adding static_assert I'm not sure of the best way of doing this. We're in a position at the moment of having some of the intrinsics already converted over to pure __builtin_shufflevector implementations despite having a similar problem - the slldq/srldq byte shifts come to mind which are pretty similar to alignr.

Abandoning this ticket - as Craig said we're creating shuffles internally which is good enough.

Revision Contents

Path

Size

include/

clang/

Basic/

	BuiltinsX86.def
	BuiltinsX86.def (revision 232067)

2 lines

lib/

CodeGen/

	CGBuiltin.cpp
	CGBuiltin.cpp (revision 232067)

36 lines

Headers/

	avx2intrin.h
	avx2intrin.h (revision 232067)

40 lines

	tmmintrin.h
	tmmintrin.h (revision 232067)

24 lines

test/

CodeGen/

	avx2-builtins.c
	avx2-builtins.c (revision 232067)

2 lines

	builtins-x86.c
	builtins-x86.c (revision 232067)

5 lines

	palignr.c
	palignr.c (revision 232067)

31 lines

	sse-builtins.c
	sse-builtins.c (revision 232067)

2 lines

Diff 21851

include/clang/Basic/BuiltinsX86.def

	Show First 20 Lines • Show All 326 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_ia32_psrldi128, "V4iV4ii", "")			BUILTIN(__builtin_ia32_psrldi128, "V4iV4ii", "")
	BUILTIN(__builtin_ia32_psrlqi128, "V2LLiV2LLii", "")			BUILTIN(__builtin_ia32_psrlqi128, "V2LLiV2LLii", "")
	BUILTIN(__builtin_ia32_psrawi128, "V8sV8si", "")			BUILTIN(__builtin_ia32_psrawi128, "V8sV8si", "")
	BUILTIN(__builtin_ia32_psradi128, "V4iV4ii", "")			BUILTIN(__builtin_ia32_psradi128, "V4iV4ii", "")
	BUILTIN(__builtin_ia32_pmaddwd128, "V4iV8sV8s", "")			BUILTIN(__builtin_ia32_pmaddwd128, "V4iV8sV8s", "")
	BUILTIN(__builtin_ia32_monitor, "vv*UiUi", "")			BUILTIN(__builtin_ia32_monitor, "vv*UiUi", "")
	BUILTIN(__builtin_ia32_mwait, "vUiUi", "")			BUILTIN(__builtin_ia32_mwait, "vUiUi", "")
	BUILTIN(__builtin_ia32_lddqu, "V16ccC*", "")			BUILTIN(__builtin_ia32_lddqu, "V16ccC*", "")
	BUILTIN(__builtin_ia32_palignr128, "V16cV16cV16cIc", "")
	BUILTIN(__builtin_ia32_insertps128, "V4fV4fV4fIc", "")			BUILTIN(__builtin_ia32_insertps128, "V4fV4fV4fIc", "")

	BUILTIN(__builtin_ia32_pblendvb128, "V16cV16cV16cV16c", "")			BUILTIN(__builtin_ia32_pblendvb128, "V16cV16cV16cV16c", "")
	BUILTIN(__builtin_ia32_blendvpd, "V2dV2dV2dV2d", "")			BUILTIN(__builtin_ia32_blendvpd, "V2dV2dV2dV2d", "")
	BUILTIN(__builtin_ia32_blendvps, "V4fV4fV4fV4f", "")			BUILTIN(__builtin_ia32_blendvps, "V4fV4fV4fV4f", "")

	BUILTIN(__builtin_ia32_packusdw128, "V8sV4iV4i", "")			BUILTIN(__builtin_ia32_packusdw128, "V8sV4iV4i", "")
	BUILTIN(__builtin_ia32_pmaxsb128, "V16cV16cV16c", "")			BUILTIN(__builtin_ia32_pmaxsb128, "V16cV16cV16c", "")
	▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_ia32_paddsb256, "V32cV32cV32c", "")			BUILTIN(__builtin_ia32_paddsb256, "V32cV32cV32c", "")
	BUILTIN(__builtin_ia32_paddsw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_paddsw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_psubsb256, "V32cV32cV32c", "")			BUILTIN(__builtin_ia32_psubsb256, "V32cV32cV32c", "")
	BUILTIN(__builtin_ia32_psubsw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_psubsw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_paddusb256, "V32cV32cV32c", "")			BUILTIN(__builtin_ia32_paddusb256, "V32cV32cV32c", "")
	BUILTIN(__builtin_ia32_paddusw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_paddusw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_psubusb256, "V32cV32cV32c", "")			BUILTIN(__builtin_ia32_psubusb256, "V32cV32cV32c", "")
	BUILTIN(__builtin_ia32_psubusw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_psubusw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_palignr256, "V32cV32cV32cIc", "")
	BUILTIN(__builtin_ia32_pavgb256, "V32cV32cV32c", "")			BUILTIN(__builtin_ia32_pavgb256, "V32cV32cV32c", "")
	BUILTIN(__builtin_ia32_pavgw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_pavgw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_pblendvb256, "V32cV32cV32cV32c", "")			BUILTIN(__builtin_ia32_pblendvb256, "V32cV32cV32cV32c", "")
	BUILTIN(__builtin_ia32_phaddw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_phaddw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_phaddd256, "V8iV8iV8i", "")			BUILTIN(__builtin_ia32_phaddd256, "V8iV8iV8i", "")
	BUILTIN(__builtin_ia32_phaddsw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_phaddsw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_phsubw256, "V16sV16sV16s", "")			BUILTIN(__builtin_ia32_phsubw256, "V16sV16sV16s", "")
	BUILTIN(__builtin_ia32_phsubd256, "V8iV8iV8i", "")			BUILTIN(__builtin_ia32_phsubd256, "V8iV8iV8i", "")
	▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,920 Lines • ▼ Show 20 Lines	case X86::BI__builtin_ia32_storelps: {
unsigned Index = BuiltinID == X86::BI__builtin_ia32_storelps ? 0 : 1;		unsigned Index = BuiltinID == X86::BI__builtin_ia32_storelps ? 0 : 1;
llvm::Value *Idx = llvm::ConstantInt::get(SizeTy, Index);		llvm::Value *Idx = llvm::ConstantInt::get(SizeTy, Index);
Ops[1] = Builder.CreateExtractElement(Ops[1], Idx, "extract");		Ops[1] = Builder.CreateExtractElement(Ops[1], Idx, "extract");

// cast pointer to i64 & store		// cast pointer to i64 & store
Ops[0] = Builder.CreateBitCast(Ops[0], PtrTy);		Ops[0] = Builder.CreateBitCast(Ops[0], PtrTy);
return Builder.CreateStore(Ops[1], Ops[0]);		return Builder.CreateStore(Ops[1], Ops[0]);
}		}
case X86::BI__builtin_ia32_palignr128:
case X86::BI__builtin_ia32_palignr256: {
unsigned ShiftVal = cast<llvm::ConstantInt>(Ops[2])->getZExtValue();

unsigned NumElts =
cast<llvm::VectorType>(Ops[0]->getType())->getNumElements();
assert(NumElts % 16 == 0);
unsigned NumLanes = NumElts / 16;
unsigned NumLaneElts = NumElts / NumLanes;

// If palignr is shifting the pair of vectors more than the size of two
// lanes, emit zero.
if (ShiftVal >= (2 * NumLaneElts))
return llvm::Constant::getNullValue(ConvertType(E->getType()));

// If palignr is shifting the pair of input vectors more than one lane,
// but less than two lanes, convert to shifting in zeroes.
if (ShiftVal > NumLaneElts) {
ShiftVal -= NumLaneElts;
Ops[0] = llvm::Constant::getNullValue(Ops[0]->getType());
}

SmallVector<llvm::Constant*, 32> Indices;
// 256-bit palignr operates on 128-bit lanes so we need to handle that
for (unsigned l = 0; l != NumElts; l += NumLaneElts) {
for (unsigned i = 0; i != NumLaneElts; ++i) {
unsigned Idx = ShiftVal + i;
if (Idx >= NumLaneElts)
Idx += NumElts - NumLaneElts; // End of lane, switch operand.
Indices.push_back(llvm::ConstantInt::get(Int32Ty, Idx + l));
}
}

Value* SV = llvm::ConstantVector::get(Indices);
return Builder.CreateShuffleVector(Ops[1], Ops[0], SV, "palignr");
}
case X86::BI__builtin_ia32_pslldqi256: {		case X86::BI__builtin_ia32_pslldqi256: {
// Shift value is in bits so divide by 8.		// Shift value is in bits so divide by 8.
unsigned shiftVal = cast<llvm::ConstantInt>(Ops[1])->getZExtValue() >> 3;		unsigned shiftVal = cast<llvm::ConstantInt>(Ops[1])->getZExtValue() >> 3;

// If pslldq is shifting the vector more than 15 bytes, emit zero.		// If pslldq is shifting the vector more than 15 bytes, emit zero.
if (shiftVal >= 16)		if (shiftVal >= 16)
return llvm::Constant::getNullValue(ConvertType(E->getType()));		return llvm::Constant::getNullValue(ConvertType(E->getType()));

▲ Show 20 Lines • Show All 503 Lines • Show Last 20 Lines

lib/Headers/avx2intrin.h

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	}			}

	static __inline__ __m256i __attribute__((__always_inline__, __nodebug__))			static __inline__ __m256i __attribute__((__always_inline__, __nodebug__))
	_mm256_adds_epu16(__m256i __a, __m256i __b)			_mm256_adds_epu16(__m256i __a, __m256i __b)
	{			{
	return (__m256i)__builtin_ia32_paddusw256((__v16hi)__a, (__v16hi)__b);			return (__m256i)__builtin_ia32_paddusw256((__v16hi)__a, (__v16hi)__b);
	}			}

	#define _mm256_alignr_epi8(a, b, n) __extension__ ({ \			#define _mm256_alignr_epi8(a, b, imm) __extension__ ({ \
	__m256i __a = (a); \			__m256i __a = (((imm)&0xFF) > 31 ? _mm256_setzero_si256() : (__m256i)(b)); \
	__m256i __b = (b); \			__m256i __b = (((imm)&0xFF) > 15 ? _mm256_setzero_si256() : (__m256i)(a)); \
	(__m256i)__builtin_ia32_palignr256((__v32qi)__a, (__v32qi)__b, (n)); })			(__m256i)__builtin_shufflevector((__v32qi)__a, (__v32qi)__b, \
				( 0+((imm)&0xF)+(( 0+((imm)&0xF))&0x10)), \
				( 1+((imm)&0xF)+(( 1+((imm)&0xF))&0x10)), \
				( 2+((imm)&0xF)+(( 2+((imm)&0xF))&0x10)), \
				( 3+((imm)&0xF)+(( 3+((imm)&0xF))&0x10)), \
				( 4+((imm)&0xF)+(( 4+((imm)&0xF))&0x10)), \
				( 5+((imm)&0xF)+(( 5+((imm)&0xF))&0x10)), \
				( 6+((imm)&0xF)+(( 6+((imm)&0xF))&0x10)), \
				( 7+((imm)&0xF)+(( 7+((imm)&0xF))&0x10)), \
				( 8+((imm)&0xF)+(( 8+((imm)&0xF))&0x10)), \
				( 9+((imm)&0xF)+(( 9+((imm)&0xF))&0x10)), \
				(10+((imm)&0xF)+((10+((imm)&0xF))&0x10)), \
				(11+((imm)&0xF)+((11+((imm)&0xF))&0x10)), \
				(12+((imm)&0xF)+((12+((imm)&0xF))&0x10)), \
				(13+((imm)&0xF)+((13+((imm)&0xF))&0x10)), \
				(14+((imm)&0xF)+((14+((imm)&0xF))&0x10)), \
				(15+((imm)&0xF)+((15+((imm)&0xF))&0x10)), \
				(16+((imm)&0xF)+(( 0+((imm)&0xF))&0x10)), \
				(17+((imm)&0xF)+(( 1+((imm)&0xF))&0x10)), \
				(18+((imm)&0xF)+(( 2+((imm)&0xF))&0x10)), \
				(19+((imm)&0xF)+(( 3+((imm)&0xF))&0x10)), \
				(20+((imm)&0xF)+(( 4+((imm)&0xF))&0x10)), \
				(21+((imm)&0xF)+(( 5+((imm)&0xF))&0x10)), \
				(22+((imm)&0xF)+(( 6+((imm)&0xF))&0x10)), \
				(23+((imm)&0xF)+(( 7+((imm)&0xF))&0x10)), \
				(24+((imm)&0xF)+(( 8+((imm)&0xF))&0x10)), \
				(25+((imm)&0xF)+(( 9+((imm)&0xF))&0x10)), \
				(26+((imm)&0xF)+((10+((imm)&0xF))&0x10)), \
				(27+((imm)&0xF)+((11+((imm)&0xF))&0x10)), \
				(28+((imm)&0xF)+((12+((imm)&0xF))&0x10)), \
				(29+((imm)&0xF)+((13+((imm)&0xF))&0x10)), \
				(30+((imm)&0xF)+((14+((imm)&0xF))&0x10)), \
				(31+((imm)&0xF)+((15+((imm)&0xF))&0x10))); })

	static __inline__ __m256i __attribute__((__always_inline__, __nodebug__))			static __inline__ __m256i __attribute__((__always_inline__, __nodebug__))
	_mm256_and_si256(__m256i __a, __m256i __b)			_mm256_and_si256(__m256i __a, __m256i __b)
	{			{
	return __a & __b;			return __a & __b;
	}			}

	static __inline__ __m256i __attribute__((__always_inline__, __nodebug__))			static __inline__ __m256i __attribute__((__always_inline__, __nodebug__))
	▲ Show 20 Lines • Show All 1,099 Lines • Show Last 20 Lines

lib/Headers/tmmintrin.h

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	}			}

	static __inline__ __m128i __attribute__((__always_inline__, __nodebug__))			static __inline__ __m128i __attribute__((__always_inline__, __nodebug__))
	_mm_abs_epi32(__m128i __a)			_mm_abs_epi32(__m128i __a)
	{			{
	return (__m128i)__builtin_ia32_pabsd128((__v4si)__a);			return (__m128i)__builtin_ia32_pabsd128((__v4si)__a);
	}			}

	#define _mm_alignr_epi8(a, b, n) __extension__ ({ \			#define _mm_alignr_epi8(a, b, imm) __extension__ ({ \
	__m128i __a = (a); \			__m128i __a = (((imm)&0xFF) > 31 ? _mm_setzero_si128() : (__m128i)(b)); \
	__m128i __b = (b); \			__m128i __b = (((imm)&0xFF) > 15 ? _mm_setzero_si128() : (__m128i)(a)); \
	(__m128i)__builtin_ia32_palignr128((__v16qi)__a, (__v16qi)__b, (n)); })			(__m128i)__builtin_shufflevector((__v16qi)__a, (__v16qi)__b, \
				( 0+((imm)&0xF)), \
				( 1+((imm)&0xF)), \
				( 2+((imm)&0xF)), \
				( 3+((imm)&0xF)), \
				( 4+((imm)&0xF)), \
				( 5+((imm)&0xF)), \
				( 6+((imm)&0xF)), \
				( 7+((imm)&0xF)), \
				( 8+((imm)&0xF)), \
				( 9+((imm)&0xF)), \
				(10+((imm)&0xF)), \
				(11+((imm)&0xF)), \
				(12+((imm)&0xF)), \
				(13+((imm)&0xF)), \
				(14+((imm)&0xF)), \
				(15+((imm)&0xF))); })

	#define _mm_alignr_pi8(a, b, n) __extension__ ({ \			#define _mm_alignr_pi8(a, b, n) __extension__ ({ \
	__m64 __a = (a); \			__m64 __a = (a); \
	__m64 __b = (b); \			__m64 __b = (b); \
	(__m64)__builtin_ia32_palignr((__v8qi)__a, (__v8qi)__b, (n)); })			(__m64)__builtin_ia32_palignr((__v8qi)__a, (__v8qi)__b, (n)); })

	static __inline__ __m128i __attribute__((__always_inline__, __nodebug__))			static __inline__ __m128i __attribute__((__always_inline__, __nodebug__))
	_mm_hadd_epi16(__m128i __a, __m128i __b)			_mm_hadd_epi16(__m128i __a, __m128i __b)
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

test/CodeGen/avx2-builtins.c

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	}			}

	__m256i test_mm256_alignr_epi8(__m256i a, __m256i b) {			__m256i test_mm256_alignr_epi8(__m256i a, __m256i b) {
	// CHECK: shufflevector <32 x i8> %{{.}}, <32 x i8> %{{.}}, <32 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 32, i32 33, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 48, i32 49>			// CHECK: shufflevector <32 x i8> %{{.}}, <32 x i8> %{{.}}, <32 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 32, i32 33, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 48, i32 49>
	return _mm256_alignr_epi8(a, b, 2);			return _mm256_alignr_epi8(a, b, 2);
	}			}

	__m256i test2_mm256_alignr_epi8(__m256i a, __m256i b) {			__m256i test2_mm256_alignr_epi8(__m256i a, __m256i b) {
	// CHECK: shufflevector <32 x i8> %{{.*}}, <32 x i8> zeroinitializer, <32 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 32, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 48>			// CHECK: shufflevector <32 x i8> %{{.}}, <32 x i8> %{{.}}, <32 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 32, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 48>
	return _mm256_alignr_epi8(a, b, 17);			return _mm256_alignr_epi8(a, b, 17);
	}			}

	__m256i test_mm256_sub_epi8(__m256i a, __m256i b) {			__m256i test_mm256_sub_epi8(__m256i a, __m256i b) {
	// CHECK: sub <32 x i8>			// CHECK: sub <32 x i8>
	return _mm256_sub_epi8(a, b);			return _mm256_sub_epi8(a, b);
	}			}

	▲ Show 20 Lines • Show All 840 Lines • Show Last 20 Lines

test/CodeGen/builtins-x86.c

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	#endif
unsigned int tmp_Ui;		unsigned int tmp_Ui;
signed long long tmp_LLi;		signed long long tmp_LLi;
// unsigned long long tmp_ULLi;		// unsigned long long tmp_ULLi;
float tmp_f;		float tmp_f;
double tmp_d;		double tmp_d;

void* tmp_vp;		void* tmp_vp;
const void* tmp_vCp;		const void* tmp_vCp;
char* tmp_cp;		char* tmp_cp;
const char* tmp_cCp;		const char* tmp_cCp;
int* tmp_ip;		int* tmp_ip;
float* tmp_fp;		float* tmp_fp;
const float* tmp_fCp;		const float* tmp_fCp;
double* tmp_dp;		double* tmp_dp;
const double* tmp_dCp;		const double* tmp_dCp;
long long* tmp_LLip;		long long* tmp_LLip;

#define imm_i 32		#define imm_i 32
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	#endif
tmp_V4i = __builtin_ia32_psrldi128(tmp_V4i, tmp_i);		tmp_V4i = __builtin_ia32_psrldi128(tmp_V4i, tmp_i);
tmp_V2LLi = __builtin_ia32_psrlqi128(tmp_V2LLi, tmp_i);		tmp_V2LLi = __builtin_ia32_psrlqi128(tmp_V2LLi, tmp_i);
tmp_V8s = __builtin_ia32_psrawi128(tmp_V8s, tmp_i);		tmp_V8s = __builtin_ia32_psrawi128(tmp_V8s, tmp_i);
tmp_V4i = __builtin_ia32_psradi128(tmp_V4i, tmp_i);		tmp_V4i = __builtin_ia32_psradi128(tmp_V4i, tmp_i);
tmp_V8s = __builtin_ia32_pmaddwd128(tmp_V8s, tmp_V8s);		tmp_V8s = __builtin_ia32_pmaddwd128(tmp_V8s, tmp_V8s);
(void) __builtin_ia32_monitor(tmp_vp, tmp_Ui, tmp_Ui);		(void) __builtin_ia32_monitor(tmp_vp, tmp_Ui, tmp_Ui);
(void) __builtin_ia32_mwait(tmp_Ui, tmp_Ui);		(void) __builtin_ia32_mwait(tmp_Ui, tmp_Ui);
tmp_V16c = __builtin_ia32_lddqu(tmp_cCp);		tmp_V16c = __builtin_ia32_lddqu(tmp_cCp);
tmp_V2LLi = __builtin_ia32_palignr128(tmp_V2LLi, tmp_V2LLi, imm_i);
tmp_V1LLi = __builtin_ia32_palignr(tmp_V1LLi, tmp_V1LLi, imm_i);		tmp_V1LLi = __builtin_ia32_palignr(tmp_V1LLi, tmp_V1LLi, imm_i);
#ifdef USE_SSE4		#ifdef USE_SSE4
tmp_V16c = __builtin_ia32_pblendvb128(tmp_V16c, tmp_V16c, tmp_V16c);		tmp_V16c = __builtin_ia32_pblendvb128(tmp_V16c, tmp_V16c, tmp_V16c);
tmp_V2d = __builtin_ia32_blendvpd(tmp_V2d, tmp_V2d, tmp_V2d);		tmp_V2d = __builtin_ia32_blendvpd(tmp_V2d, tmp_V2d, tmp_V2d);
tmp_V4f = __builtin_ia32_blendvps(tmp_V4f, tmp_V4f, tmp_V4f);		tmp_V4f = __builtin_ia32_blendvps(tmp_V4f, tmp_V4f, tmp_V4f);
tmp_V8s = __builtin_ia32_packusdw128(tmp_V4i, tmp_V4i);		tmp_V8s = __builtin_ia32_packusdw128(tmp_V4i, tmp_V4i);
tmp_V16c = __builtin_ia32_pmaxsb128(tmp_V16c, tmp_V16c);		tmp_V16c = __builtin_ia32_pmaxsb128(tmp_V16c, tmp_V16c);
tmp_V4i = __builtin_ia32_pmaxsd128(tmp_V4i, tmp_V4i);		tmp_V4i = __builtin_ia32_pmaxsd128(tmp_V4i, tmp_V4i);
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

test/CodeGen/palignr.c

	// REQUIRES: x86-registered-target
	// RUN: %clang_cc1 %s -triple=i686-apple-darwin -target-feature +ssse3 -O1 -S -o - \| FileCheck %s

	#define _mm_alignr_epi8(a, b, n) (__builtin_ia32_palignr128((a), (b), (n)))
	typedef __attribute__((vector_size(16))) int int4;

	// CHECK: palignr
	int4 align1(int4 a, int4 b) { return _mm_alignr_epi8(a, b, 15); }
	// CHECK: ret
	// CHECK: ret
	// CHECK-NOT: palignr
	int4 align2(int4 a, int4 b) { return _mm_alignr_epi8(a, b, 16); }
	// CHECK: psrldq
	int4 align3(int4 a, int4 b) { return _mm_alignr_epi8(a, b, 17); }
	// CHECK: xor
	int4 align4(int4 a, int4 b) { return _mm_alignr_epi8(a, b, 32); }

	#define _mm_alignr_pi8(a, b, n) (__builtin_ia32_palignr((a), (b), (n)))
	typedef __attribute__((vector_size(8))) int int2;

	// CHECK: palignr
	int2 align5(int2 a, int2 b) { return _mm_alignr_pi8(a, b, 8); }

	// CHECK: palignr
	int2 align6(int2 a, int2 b) { return _mm_alignr_pi8(a, b, 9); }

	// CHECK: palignr
	int2 align7(int2 a, int2 b) { return _mm_alignr_pi8(a, b, 16); }

	// CHECK: palignr
	int2 align8(int2 a, int2 b) { return _mm_alignr_pi8(a, b, 7); }

test/CodeGen/sse-builtins.c

	Show First 20 Lines • Show All 568 Lines • ▼ Show 20 Lines
	}			}

	__m128i test_mm_alignr_epi8(__m128i a, __m128i b) {			__m128i test_mm_alignr_epi8(__m128i a, __m128i b) {
	// CHECK: shufflevector <16 x i8> %{{.}}, <16 x i8> %{{.}}, <16 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>			// CHECK: shufflevector <16 x i8> %{{.}}, <16 x i8> %{{.}}, <16 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17>
	return _mm_alignr_epi8(a, b, 2);			return _mm_alignr_epi8(a, b, 2);
	}			}

	__m128i test2_mm_alignr_epi8(__m128i a, __m128i b) {			__m128i test2_mm_alignr_epi8(__m128i a, __m128i b) {
	// CHECK: shufflevector <16 x i8> %{{.*}}, <16 x i8> zeroinitializer, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>			// CHECK: shufflevector <16 x i8> %{{.}}, <16 x i8> %{{.}}, <16 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>
	return _mm_alignr_epi8(a, b, 17);			return _mm_alignr_epi8(a, b, 17);
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Replace (v)palignr intrinsics with generic shuffles (Clang)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 21851

include/clang/Basic/BuiltinsX86.def

lib/CodeGen/CGBuiltin.cpp

lib/Headers/avx2intrin.h

lib/Headers/tmmintrin.h

test/CodeGen/avx2-builtins.c

test/CodeGen/builtins-x86.c

test/CodeGen/palignr.c

test/CodeGen/sse-builtins.c

[X86] Replace (v)palignr intrinsics with generic shuffles (Clang)
AbandonedPublic