This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX] Support bitmask style integer shuffles for 256-bit integer vectors
ClosedPublic

Authored by RKSimon on Feb 16 2016, 7:48 AM.

Download Raw Diff

Details

Reviewers

spatel
ab
delena
andreadb

Commits

rGc5b5dcb9852f: [X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors
rGa50e8d3627eb: [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors
rL261082: [X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors
rL261081: [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors

Summary

AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.

This patch adds the ability to lower using the bit-mask/bit-blend patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 48070.Feb 16 2016, 7:48 AM

RKSimon retitled this revision from to [X86][AVX] Support bitmask style integer shuffles for 256-bit integer vectors.

RKSimon updated this object.

RKSimon added reviewers: delena, ab, spatel, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

LGTM.

I'd split this in two to make what changed even more obvious and bisectable.

This revision is now accepted and ready to land.Feb 16 2016, 3:54 PM

In D17292#354068, @ab wrote:

I'd split this in two to make what changed even more obvious and bisectable.

That should be pretty trivial, thanks Ahmed.

Closed by commit rL261081: [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors (authored by RKSimon). · Explain WhyFeb 17 2016, 2:42 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

8 lines

test/

CodeGen/

X86/

vec_int_to_fp.ll

4 lines

vector-shuffle-256-v32.ll

17 lines

vector-zext.ll

16 lines

Diff 48168

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,474 Lines • ▼ Show 20 Lines	static SDValue lower256BitVectorShuffle(SDValue Op, SDValue V1, SDValue V2,
// There is a really nice hard cut-over between AVX1 and AVX2 that means we		// There is a really nice hard cut-over between AVX1 and AVX2 that means we
// can check for those subtargets here and avoid much of the subtarget		// can check for those subtargets here and avoid much of the subtarget
// querying in the per-vector-type lowering routines. With AVX1 we have		// querying in the per-vector-type lowering routines. With AVX1 we have
// essentially zero ability to manipulate a 256-bit vector with integer		// essentially zero ability to manipulate a 256-bit vector with integer
// types. Since we'll use floating point types there eventually, just		// types. Since we'll use floating point types there eventually, just
// immediately cast everything to a float and operate entirely in that domain.		// immediately cast everything to a float and operate entirely in that domain.
if (VT.isInteger() && !Subtarget.hasAVX2()) {		if (VT.isInteger() && !Subtarget.hasAVX2()) {
int ElementBits = VT.getScalarSizeInBits();		int ElementBits = VT.getScalarSizeInBits();
if (ElementBits < 32)		if (ElementBits < 32) {
// No floating point type available, decompose into 128-bit vectors.		// No floating point type available, if we can't use the bit operations
		// for masking then decompose into 128-bit vectors.
		if (SDValue V = lowerVectorShuffleAsBitMask(DL, VT, V1, V2, Mask, DAG))
		return V;
return splitAndLowerVectorShuffle(DL, VT, V1, V2, Mask, DAG);		return splitAndLowerVectorShuffle(DL, VT, V1, V2, Mask, DAG);
		}

MVT FpVT = MVT::getVectorVT(MVT::getFloatingPointVT(ElementBits),		MVT FpVT = MVT::getVectorVT(MVT::getFloatingPointVT(ElementBits),
VT.getVectorNumElements());		VT.getVectorNumElements());
V1 = DAG.getBitcast(FpVT, V1);		V1 = DAG.getBitcast(FpVT, V1);
V2 = DAG.getBitcast(FpVT, V2);		V2 = DAG.getBitcast(FpVT, V2);
return DAG.getBitcast(VT, DAG.getVectorShuffle(FpVT, DL, V1, V2, Mask));		return DAG.getBitcast(VT, DAG.getVectorShuffle(FpVT, DL, V1, V2, Mask));
}		}

▲ Show 20 Lines • Show All 18,137 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll

	Show First 20 Lines • Show All 1,748 Lines • ▼ Show 20 Lines
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm1			; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_8i8_to_8f32:			; AVX1-LABEL: uitofp_8i8_to_8f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
	; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_8i8_to_8f32:			; AVX2-LABEL: uitofp_8i8_to_8f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v32.ll

	Show First 20 Lines • Show All 943 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vmovdqa {{.*#+}} ymm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]			; AVX2-NEXT: vmovdqa {{.*#+}} ymm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
	; AVX2-NEXT: vpblendvb %ymm2, %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 32, i32 1, i32 34, i32 3, i32 36, i32 5, i32 38, i32 7, i32 40, i32 9, i32 42, i32 11, i32 44, i32 13, i32 46, i32 15, i32 48, i32 17, i32 50, i32 19, i32 52, i32 21, i32 54, i32 23, i32 56, i32 25, i32 58, i32 27, i32 60, i32 29, i32 62, i32 31>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 32, i32 1, i32 34, i32 3, i32 36, i32 5, i32 38, i32 7, i32 40, i32 9, i32 42, i32 11, i32 44, i32 13, i32 46, i32 15, i32 48, i32 17, i32 50, i32 19, i32 52, i32 21, i32 54, i32 23, i32 56, i32 25, i32 58, i32 27, i32 60, i32 29, i32 62, i32 31>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_zz_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31(<32 x i8> %a) {			define <32 x i8> @shuffle_v32i8_zz_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31(<32 x i8> %a) {
	; AVX1-LABEL: shuffle_v32i8_zz_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31:			; ALL-LABEL: shuffle_v32i8_zz_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31:
	; AVX1: # BB#0:			; ALL: # BB#0:
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; ALL-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps {{.*#+}} xmm2 = [0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255]			; ALL-NEXT: retq
	; AVX1-NEXT: vandps %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vandps %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: retq
	;
	; AVX2-LABEL: shuffle_v32i8_zz_01_zz_03_zz_05_zz_07_zz_09_zz_11_zz_13_zz_15_zz_17_zz_19_zz_21_zz_23_zz_25_zz_27_zz_29_zz_31:
	; AVX2: # BB#0:
	; AVX2-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq
	%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 32, i32 1, i32 34, i32 3, i32 36, i32 5, i32 38, i32 7, i32 40, i32 9, i32 42, i32 11, i32 44, i32 13, i32 46, i32 15, i32 48, i32 17, i32 50, i32 19, i32 52, i32 21, i32 54, i32 23, i32 56, i32 25, i32 58, i32 27, i32 60, i32 29, i32 62, i32 31>			%shuffle = shufflevector <32 x i8> %a, <32 x i8> zeroinitializer, <32 x i32> <i32 32, i32 1, i32 34, i32 3, i32 36, i32 5, i32 38, i32 7, i32 40, i32 9, i32 42, i32 11, i32 44, i32 13, i32 46, i32 15, i32 48, i32 17, i32 50, i32 19, i32 52, i32 21, i32 54, i32 23, i32 56, i32 25, i32 58, i32 27, i32 60, i32 29, i32 62, i32 31>
	ret <32 x i8> %shuffle			ret <32 x i8> %shuffle
	}			}

	define <32 x i8> @shuffle_v32i8_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32(<32 x i8> %a, <32 x i8> %b) {			define <32 x i8> @shuffle_v32i8_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32(<32 x i8> %a, <32 x i8> %b) {
	; AVX1-LABEL: shuffle_v32i8_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32:			; AVX1-LABEL: shuffle_v32i8_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32_00_32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; AVX1-NEXT: vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	▲ Show 20 Lines • Show All 1,190 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-zext.ll

	Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_16i8_to_8i32:			; AVX1-LABEL: zext_16i8_to_8i32:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
	; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_16i8_to_8i32:			; AVX2-LABEL: zext_16i8_to_8i32:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: psrld $16, %xmm0			; SSE41-NEXT: psrld $16, %xmm0
	; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_16i8_to_4i64:			; AVX1-LABEL: zext_16i8_to_4i64:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
	; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0			; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_16i8_to_4i64:			; AVX2-LABEL: zext_16i8_to_4i64:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero			; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_8i16_to_4i64:			; AVX1-LABEL: zext_8i16_to_4i64:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
	; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm1[0],xmm2[1,2,3],xmm1[4],xmm2[5,6,7]
	; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3],xmm0[4],xmm2[5,6,7]			; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_8i16_to_4i64:			; AVX2-LABEL: zext_8i16_to_4i64:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1			; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX2-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]			; AVX2-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	▲ Show 20 Lines • Show All 1,215 Lines • Show Last 20 Lines