This is an archive of the discontinued LLVM Phabricator instance.

DAGCombine: Extend createBuildVecShuffle for case len(in_vec) = 4*len(result_vec)
AbandonedPublic

Authored by zvi on May 9 2017, 12:57 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
craig.topper
hfinkel
efriedma

Summary

Add support for the case where there is a single input vector from which
elements are gathered into a build_vector, and that input vector is 4x longer
than the result vector.
VECTOR_SHUFFLE requires that the input vectors and result vector be of same type,
and createBuildVecShuffle() already has some recipes for handling numerous cases
to meet this requirement. As for the case the patch addresses:
This is done by splitting the input vector to two half-sized vectors and
extending the result vector to become twice as long.

Diff Detail

Build Status

Buildable 6270
Build 6270: arc lint + arc unit

Event Timeline

zvi created this revision.May 9 2017, 12:57 AM

Herald added a subscriber: javed.absar. · View Herald TranscriptMay 9 2017, 12:57 AM

zvi added inline comments.May 9 2017, 1:05 AM

test/CodeGen/ARM/vpadd.ll
376	This appears to be a regression for ARM codegen. Assuming it is, what the options for fixing it? IMHO these are the options ordered by preference: Can we improve the ARM backend to handle this case? Add a TLI hook for deciding when insert-extract sequences are better than composed shuffle? Do this only in the X86 lowering.

zvi added a parent revision: D31961: DAGCombine: Combine shuffles of splat-shuffles.May 9 2017, 1:13 AM

efriedma edited reviewers, added: efriedma; removed: eli.friedman.May 9 2017, 8:32 AM

efriedma added a subscriber: efriedma.

efriedma added inline comments.

test/CodeGen/ARM/vpadd.ll
376	We have a combine in the ARM backend which specifically combines vuzp+vadd to vpadd. It looks like the reason it isn't triggering here is that we're doing the vuzp in the wrong width; probably easy to fix.

zvi added inline comments.May 10 2017, 11:53 AM

test/CodeGen/ARM/vpadd.ll

376

Thanks for highlighting the problem, Eli.

The following case shows the same missed combine opportunity without this patch, by being lowered to the same asm code as the right-hand side of the diff.

define void @test(<16 x i8> *%cbcr, <4 x i16> *%X) nounwind ssp {
  %tmp = load <16 x i8>, <16 x i8>* %cbcr
  %tmp1 = zext <16 x i8> %tmp to <16 x i16>
  %tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
  %tmp2a = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %tmp3 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
  %tmp3a = shufflevector <8 x i16> %tmp3, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %add = add <4 x i16> %tmp2a, %tmp3a
  store <4 x i16> %add, <4 x i16>* %X, align 8
  ret void
}

zvi added inline comments.May 11 2017, 12:14 AM

test/CodeGen/ARM/vpadd.ll
376	Create Bug 32999 to track this.

zvi added inline comments.May 18 2017, 8:17 AM

test/CodeGen/ARM/vpadd.ll
376	Just want to understand what is needed for this review to proceed. Does the ARM regression need to be fixed first, or are we ok with letting it in assuming it is easy to fix and will be fixed shortly after?

efriedma added inline comments.May 18 2017, 11:40 AM

test/CodeGen/ARM/vpadd.ll
376	Taking another look, I'm not convinced we should just be putting this issue aside. It's not really an ARM-specific issue: we're deciding to create an EXTRACT_SUBVECTOR from an 128-bit shuffle rather than just creating a 64-bit shuffle, which might be more efficient depending on the hardware. Equivalently, for x86 hardware, this is like creating a ymm shuffle when an xmm shuffle would be sufficient. I mean, this issue isn't really important enough to block this change, but I'd like to make sure we at least understand what we're doing.

zvi added inline comments.May 18 2017, 2:28 PM

test/CodeGen/ARM/vpadd.ll
376	I see your point. Going back to a list of options: Conservatively bail out if (min_mask_index2 > NumElem \|\| max_mask_index 2 < NumElems) which means that we are accessing elements from one half of the input vector Add a TLI hook to let the target decide if the large shuffle is ok Always allow creation of large shuffles (what the current patch does)

Conservatively bail out if (min_mask_index*2 > NumElem || max_mask_index * 2 < NumElems) which means that we are accessing elements from one half of the input vector

You could extend this a little: try to cut the input size to one quarter, and generate the shuffle that way, if we can.

I don't think we need a new target hook here; isExtractSubvectorCheap should be enough to drive the behavior here.

Always allow creation of large shuffles (what the current patch does)

We could try to clean this up later in DAGCombine, yes... but it seems better to try to generate a reasonable shuffle from the start.

zvi added a subscriber: igorb.Jun 13 2017, 11:53 PM

@zvi Abandon this? AFAICT we seem to have improved all the x86 cases with shuffle combining improvements already.

zvi abandoned this revision.Dec 24 2019, 5:37 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

15 lines

test/

CodeGen/

ARM/

vpadd.ll

3 lines

X86/

oddshuffles.ll

8 lines

shuffle-vs-trunc-512.ll

209 lines

vector-shuffle-512-v32.ll

22 lines

Diff 98257

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,918 Lines • ▼ Show 20 Lines	if ((VT.getSizeInBits() % InVT1.getSizeInBits() == 0) && InVT1 == InVT2) {
}		}
ShuffleNumElems = NumElems * 2;		ShuffleNumElems = NumElems * 2;
} else {		} else {
// Both VecIn1 and VecIn2 are wider than the output, and VecIn2 is wider		// Both VecIn1 and VecIn2 are wider than the output, and VecIn2 is wider
// than VecIn1. We can't handle this for now - this case will disappear		// than VecIn1. We can't handle this for now - this case will disappear
// when we start sorting the vectors by type.		// when we start sorting the vectors by type.
return SDValue();		return SDValue();
}		}
		} else if (InVT1.getSizeInBits() == VT.getSizeInBits() * 4 &&
		!VecIn2.getNode()) {
		if (!TLI.isExtractSubvectorCheap(VT, NumElems))
		return SDValue();
		// If there is one input vector, and it is 4x the size of the
		// output, split it in two, and lengthen the output to 2x.
		ShuffleNumElems = NumElems * 2;
		EVT NewVT = VT.getVectorVT(*DAG.getContext(), VT.getScalarType(),
		ShuffleNumElems);
		VecIn2 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewVT, VecIn1,
		DAG.getConstant(NumElems * 2, DL, IdxTy));
		VecIn1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewVT, VecIn1, ZeroIdx);
		// The input vectors are now shorter, so adjust the offset of the
		// second vector's start.
		Vec2Offset = NumElems * 2;
} else {		} else {
// TODO: Support cases where the length mismatch isn't exactly by a		// TODO: Support cases where the length mismatch isn't exactly by a
// factor of 2.		// factor of 2.
// TODO: Move this check upwards, so that if we have bad type		// TODO: Move this check upwards, so that if we have bad type
// mismatches, we don't create any DAG nodes.		// mismatches, we don't create any DAG nodes.
return SDValue();		return SDValue();
}		}
}		}
▲ Show 20 Lines • Show All 2,718 Lines • Show Last 20 Lines

test/CodeGen/ARM/vpadd.ll

	Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
	}			}

	; Matching to vpaddl.8 requires matching shuffle(zext()).			; Matching to vpaddl.8 requires matching shuffle(zext()).
	define void @addCombineToVPADDL_u8_early_zext(<16 x i8> %cbcr, <4 x i16> %X) nounwind ssp {			define void @addCombineToVPADDL_u8_early_zext(<16 x i8> %cbcr, <4 x i16> %X) nounwind ssp {
	; CHECK-LABEL: addCombineToVPADDL_u8_early_zext:			; CHECK-LABEL: addCombineToVPADDL_u8_early_zext:
	; CHECK: @ BB#0:			; CHECK: @ BB#0:
	; CHECK-NEXT: vld1.64 {d16, d17}, [r0]			; CHECK-NEXT: vld1.64 {d16, d17}, [r0]
	; CHECK-NEXT: vmovl.u8 q8, d16			; CHECK-NEXT: vmovl.u8 q8, d16
	; CHECK-NEXT: vpadd.i16 d16, d16, d17			; CHECK-NEXT: vuzp.16 q8, q9
				zviAuthorUnsubmitted Not Done Reply Inline Actions This appears to be a regression for ARM codegen. Assuming it is, what the options for fixing it? IMHO these are the options ordered by preference: Can we improve the ARM backend to handle this case? Add a TLI hook for deciding when insert-extract sequences are better than composed shuffle? Do this only in the X86 lowering. zvi: This appears to be a regression for ARM codegen. Assuming it is, what the options for fixing it?
				efriedmaUnsubmitted Not Done Reply Inline Actions We have a combine in the ARM backend which specifically combines vuzp+vadd to vpadd. It looks like the reason it isn't triggering here is that we're doing the vuzp in the wrong width; probably easy to fix. efriedma: We have a combine in the ARM backend which specifically combines vuzp+vadd to vpadd. It looks…
				zviAuthorUnsubmitted Not Done Reply Inline Actions Thanks for highlighting the problem, Eli. The following case shows the same missed combine opportunity without this patch, by being lowered to the same asm code as the right-hand side of the diff. define void @test(<16 x i8> %cbcr, <4 x i16> %X) nounwind ssp { %tmp = load <16 x i8>, <16 x i8>* %cbcr %tmp1 = zext <16 x i8> %tmp to <16 x i16> %tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef> %tmp2a = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %tmp3 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef> %tmp3a = shufflevector <8 x i16> %tmp3, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %add = add <4 x i16> %tmp2a, %tmp3a store <4 x i16> %add, <4 x i16>* %X, align 8 ret void } zvi: Thanks for highlighting the problem, Eli. The following case shows the same missed combine…
				zviAuthorUnsubmitted Not Done Reply Inline Actions Create Bug 32999 to track this. zvi: Create [[ https://bugs.llvm.org/show_bug.cgi?id=32999 \| Bug 32999 ]] to track this.
				zviAuthorUnsubmitted Not Done Reply Inline Actions Just want to understand what is needed for this review to proceed. Does the ARM regression need to be fixed first, or are we ok with letting it in assuming it is easy to fix and will be fixed shortly after? zvi: Just want to understand what is needed for this review to proceed. Does the ARM regression need…
				efriedmaUnsubmitted Not Done Reply Inline Actions Taking another look, I'm not convinced we should just be putting this issue aside. It's not really an ARM-specific issue: we're deciding to create an EXTRACT_SUBVECTOR from an 128-bit shuffle rather than just creating a 64-bit shuffle, which might be more efficient depending on the hardware. Equivalently, for x86 hardware, this is like creating a ymm shuffle when an xmm shuffle would be sufficient. I mean, this issue isn't really important enough to block this change, but I'd like to make sure we at least understand what we're doing. efriedma: Taking another look, I'm not convinced we should just be putting this issue aside. It's not…
				zviAuthorUnsubmitted Not Done Reply Inline Actions I see your point. Going back to a list of options: Conservatively bail out if (min_mask_index2 > NumElem \|\| max_mask_index 2 < NumElems) which means that we are accessing elements from one half of the input vector Add a TLI hook to let the target decide if the large shuffle is ok Always allow creation of large shuffles (what the current patch does) zvi: I see your point. Going back to a list of options: 1. Conservatively bail out if…
				; CHECK-NEXT: vadd.i16 d16, d16, d18
	; CHECK-NEXT: vstr d16, [r1]			; CHECK-NEXT: vstr d16, [r1]
	; CHECK-NEXT: mov pc, lr			; CHECK-NEXT: mov pc, lr
	%tmp = load <16 x i8>, <16 x i8>* %cbcr			%tmp = load <16 x i8>, <16 x i8>* %cbcr
	%tmp1 = zext <16 x i8> %tmp to <16 x i16>			%tmp1 = zext <16 x i8> %tmp to <16 x i16>
	%tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%tmp2 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%tmp3 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%tmp3 = shufflevector <16 x i16> %tmp1, <16 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%add = add <4 x i16> %tmp2, %tmp3			%add = add <4 x i16> %tmp2, %tmp3
	store <4 x i16> %add, <4 x i16>* %X, align 8			store <4 x i16> %add, <4 x i16>* %X, align 8
	▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

test/CodeGen/X86/oddshuffles.ll

	Show First 20 Lines • Show All 1,441 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vmovaps %ymm1, 32(%rdi)			; AVX1-NEXT: vmovaps %ymm1, 32(%rdi)
	; AVX1-NEXT: vmovaps %ymm1, (%rdi)			; AVX1-NEXT: vmovaps %ymm1, (%rdi)
	; AVX1-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>			; AVX1-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: wrongorder:			; AVX2-LABEL: wrongorder:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vbroadcastsd %xmm0, %ymm1			; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
	; AVX2-NEXT: vmovapd %ymm1, 32(%rdi)			; AVX2-NEXT: vmovaps %ymm0, 32(%rdi)
	; AVX2-NEXT: vmovapd %ymm1, (%rdi)			; AVX2-NEXT: vmovaps %ymm0, (%rdi)
	; AVX2-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]			; AVX2-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuffle = shufflevector <4 x double> %A, <4 x double> %A, <8 x i32> zeroinitializer			%shuffle = shufflevector <4 x double> %A, <4 x double> %A, <8 x i32> zeroinitializer
	store <8 x double> %shuffle, <8 x double>* %P, align 64			store <8 x double> %shuffle, <8 x double>* %P, align 64
	%m2 = load <8 x double>, <8 x double>* %P, align 64			%m2 = load <8 x double>, <8 x double>* %P, align 64
	store <8 x double> %m2, <8 x double>* %P, align 64			store <8 x double> %m2, <8 x double>* %P, align 64
	%m3 = load <8 x double>, <8 x double>* %P, align 64			%m3 = load <8 x double>, <8 x double>* %P, align 64
	%m4 = shufflevector <8 x double> %m3, <8 x double> undef, <2 x i32> <i32 2, i32 0>			%m4 = shufflevector <8 x double> %m3, <8 x double> undef, <2 x i32> <i32 2, i32 0>
	ret <2 x double> %m4			ret <2 x double> %m4
	}			}

test/CodeGen/X86/shuffle-vs-trunc-512.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]			; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; AVX512VL-NEXT: vmovdqa %xmm0, (%rsi)			; AVX512VL-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: shuffle_v64i8_to_v16i8:			; AVX512BW-LABEL: shuffle_v64i8_to_v16i8:
	; AVX512BW: # BB#0:			; AVX512BW: # BB#0:
	; AVX512BW-NEXT: vmovdqu8 (%rdi), %zmm0			; AVX512BW-NEXT: vmovdqu8 (%rdi), %zmm0
	; AVX512BW-NEXT: vpextrb $4, %xmm0, %eax			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512BW-NEXT: vpextrb $0, %xmm0, %ecx			; AVX512BW-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX512BW-NEXT: vmovd %ecx, %xmm1			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm3 = <u,u,u,u,0,4,8,12,u,u,u,u,u,u,u,u>
	; AVX512BW-NEXT: vpinsrb $1, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512BW-NEXT: vpextrb $8, %xmm0, %eax			; AVX512BW-NEXT: vpshufb %xmm3, %xmm1, %xmm1
	; AVX512BW-NEXT: vpinsrb $2, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
	; AVX512BW-NEXT: vpextrb $12, %xmm0, %eax			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX512BW-NEXT: vpinsrb $3, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm3 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512BW-NEXT: vextracti32x4 $1, %zmm0, %xmm2			; AVX512BW-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512BW-NEXT: vpextrb $0, %xmm2, %eax			; AVX512BW-NEXT: vpshufb %xmm3, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $4, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512BW-NEXT: vpextrb $4, %xmm2, %eax			; AVX512BW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; AVX512BW-NEXT: vpinsrb $5, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $8, %xmm2, %eax
	; AVX512BW-NEXT: vpinsrb $6, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $12, %xmm2, %eax
	; AVX512BW-NEXT: vpinsrb $7, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vextracti32x4 $2, %zmm0, %xmm2
	; AVX512BW-NEXT: vpextrb $0, %xmm2, %eax
	; AVX512BW-NEXT: vpinsrb $8, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $4, %xmm2, %eax
	; AVX512BW-NEXT: vpinsrb $9, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $8, %xmm2, %eax
	; AVX512BW-NEXT: vpinsrb $10, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $12, %xmm2, %eax
	; AVX512BW-NEXT: vpinsrb $11, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vextracti32x4 $3, %zmm0, %xmm0
	; AVX512BW-NEXT: vpextrb $0, %xmm0, %eax
	; AVX512BW-NEXT: vpinsrb $12, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $4, %xmm0, %eax
	; AVX512BW-NEXT: vpinsrb $13, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $8, %xmm0, %eax
	; AVX512BW-NEXT: vpinsrb $14, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $12, %xmm0, %eax
	; AVX512BW-NEXT: vpinsrb $15, %eax, %xmm1, %xmm0
	; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)			; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: shuffle_v64i8_to_v16i8:			; AVX512BWVL-LABEL: shuffle_v64i8_to_v16i8:
	; AVX512BWVL: # BB#0:			; AVX512BWVL: # BB#0:
	; AVX512BWVL-NEXT: vmovdqu8 (%rdi), %zmm0			; AVX512BWVL-NEXT: vmovdqu8 (%rdi), %zmm0
	; AVX512BWVL-NEXT: vpextrb $4, %xmm0, %eax			; AVX512BWVL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512BWVL-NEXT: vpextrb $0, %xmm0, %ecx			; AVX512BWVL-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX512BWVL-NEXT: vmovd %ecx, %xmm1			; AVX512BWVL-NEXT: vmovdqu {{.*#+}} xmm3 = <u,u,u,u,0,4,8,12,u,u,u,u,u,u,u,u>
	; AVX512BWVL-NEXT: vpinsrb $1, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512BWVL-NEXT: vpextrb $8, %xmm0, %eax			; AVX512BWVL-NEXT: vpshufb %xmm3, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpinsrb $2, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
	; AVX512BWVL-NEXT: vpextrb $12, %xmm0, %eax			; AVX512BWVL-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX512BWVL-NEXT: vpinsrb $3, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vmovdqu {{.*#+}} xmm3 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512BWVL-NEXT: vextracti32x4 $1, %zmm0, %xmm2			; AVX512BWVL-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512BWVL-NEXT: vpextrb $0, %xmm2, %eax			; AVX512BWVL-NEXT: vpshufb %xmm3, %xmm0, %xmm0
	; AVX512BWVL-NEXT: vpinsrb $4, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512BWVL-NEXT: vpextrb $4, %xmm2, %eax			; AVX512BWVL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; AVX512BWVL-NEXT: vpinsrb $5, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX512BWVL-NEXT: vpextrb $8, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $6, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $12, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $7, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vextracti32x4 $2, %zmm0, %xmm2
	; AVX512BWVL-NEXT: vpextrb $0, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $8, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $4, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $9, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $8, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $10, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $12, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $11, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vextracti32x4 $3, %zmm0, %xmm0
	; AVX512BWVL-NEXT: vpextrb $0, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrb $12, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $4, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrb $13, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $8, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrb $14, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $12, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrb $15, %eax, %xmm1, %xmm0
	; AVX512BWVL-NEXT: vmovdqu %xmm0, (%rsi)
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	%vec = load <64 x i8>, <64 x i8>* %L			%vec = load <64 x i8>, <64 x i8>* %L
	%strided.vec = shufflevector <64 x i8> %vec, <64 x i8> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>			%strided.vec = shufflevector <64 x i8> %vec, <64 x i8> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>
	store <16 x i8> %strided.vec, <16 x i8>* %S			store <16 x i8> %strided.vec, <16 x i8>* %S
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]			; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; AVX512VL-NEXT: vmovdqa %xmm0, (%rsi)			; AVX512VL-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: shuffle_v32i16_to_v8i16:			; AVX512BW-LABEL: shuffle_v32i16_to_v8i16:
	; AVX512BW: # BB#0:			; AVX512BW: # BB#0:
	; AVX512BW-NEXT: vmovdqu16 (%rdi), %zmm0			; AVX512BW-NEXT: vmovdqu16 (%rdi), %zmm0
	; AVX512BW-NEXT: vxorps %xmm1, %xmm1, %xmm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512BW-NEXT: vmovss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]			; AVX512BW-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX512BW-NEXT: vpextrw $4, %xmm0, %eax			; AVX512BW-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
	; AVX512BW-NEXT: vpinsrw $1, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpshuflw {{.*#+}} xmm2 = xmm2[0,1,0,2,4,5,6,7]
	; AVX512BW-NEXT: vextracti32x4 $1, %zmm0, %xmm2			; AVX512BW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
	; AVX512BW-NEXT: vmovd %xmm2, %eax			; AVX512BW-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,1,0,2,4,5,6,7]
	; AVX512BW-NEXT: vpinsrw $2, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
	; AVX512BW-NEXT: vpextrw $4, %xmm2, %eax			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX512BW-NEXT: vpinsrw $3, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
	; AVX512BW-NEXT: vextracti32x4 $2, %zmm0, %xmm2			; AVX512BW-NEXT: vpshuflw {{.*#+}} xmm2 = xmm2[0,2,2,3,4,5,6,7]
	; AVX512BW-NEXT: vmovd %xmm2, %eax			; AVX512BW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX512BW-NEXT: vpinsrw $4, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
	; AVX512BW-NEXT: vpextrw $4, %xmm2, %eax			; AVX512BW-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512BW-NEXT: vpinsrw $5, %eax, %xmm1, %xmm1			; AVX512BW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; AVX512BW-NEXT: vextracti32x4 $3, %zmm0, %xmm0
	; AVX512BW-NEXT: vmovd %xmm0, %eax
	; AVX512BW-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrw $4, %xmm0, %eax
	; AVX512BW-NEXT: vpinsrw $7, %eax, %xmm1, %xmm0
	; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)			; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: shuffle_v32i16_to_v8i16:			; AVX512BWVL-LABEL: shuffle_v32i16_to_v8i16:
	; AVX512BWVL: # BB#0:			; AVX512BWVL: # BB#0:
	; AVX512BWVL-NEXT: vmovdqu16 (%rdi), %zmm0			; AVX512BWVL-NEXT: vmovdqu16 (%rdi), %zmm0
	; AVX512BWVL-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX512BWVL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512BWVL-NEXT: vmovss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]			; AVX512BWVL-NEXT: vmovdqu {{.*#+}} ymm2 = <0,4,8,12,16,20,24,28,u,u,u,u,u,u,u,u>
	; AVX512BWVL-NEXT: vpextrw $4, %xmm0, %eax			; AVX512BWVL-NEXT: vpermi2w %ymm1, %ymm0, %ymm2
	; AVX512BWVL-NEXT: vpinsrw $1, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vmovdqa %xmm2, (%rsi)
	; AVX512BWVL-NEXT: vextracti32x4 $1, %zmm0, %xmm2
	; AVX512BWVL-NEXT: vmovd %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrw $2, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrw $4, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrw $3, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vextracti32x4 $2, %zmm0, %xmm2
	; AVX512BWVL-NEXT: vmovd %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrw $4, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrw $4, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrw $5, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vextracti32x4 $3, %zmm0, %xmm0
	; AVX512BWVL-NEXT: vmovd %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrw $4, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrw $7, %eax, %xmm1, %xmm0
	; AVX512BWVL-NEXT: vmovdqu %xmm0, (%rsi)
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq

	%vec = load <32 x i16>, <32 x i16>* %L			%vec = load <32 x i16>, <32 x i16>* %L
	%strided.vec = shufflevector <32 x i16> %vec, <32 x i16> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>			%strided.vec = shufflevector <32 x i16> %vec, <32 x i16> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
	store <8 x i16> %strided.vec, <8 x i16>* %S			store <8 x i16> %strided.vec, <8 x i16>* %S
	ret void			ret void
	}			}

	define void @trunc_v8i64_to_v8i16(<32 x i16>* %L, <8 x i16>* %S) nounwind {			define void @trunc_v8i64_to_v8i16(<32 x i16>* %L, <8 x i16>* %S) nounwind {
	; AVX512-LABEL: trunc_v8i64_to_v8i16:			; AVX512-LABEL: trunc_v8i64_to_v8i16:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]			; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
	; AVX512VL-NEXT: vmovq %xmm0, (%rsi)			; AVX512VL-NEXT: vmovq %xmm0, (%rsi)
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: shuffle_v64i8_to_v8i8:			; AVX512BW-LABEL: shuffle_v64i8_to_v8i8:
	; AVX512BW: # BB#0:			; AVX512BW: # BB#0:
	; AVX512BW-NEXT: vmovdqu8 (%rdi), %zmm0			; AVX512BW-NEXT: vmovdqu8 (%rdi), %zmm0
	; AVX512BW-NEXT: vextracti32x4 $3, %zmm0, %xmm1			; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512BW-NEXT: vpextrb $8, %xmm1, %r8d			; AVX512BW-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX512BW-NEXT: vpextrb $0, %xmm1, %r9d			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm3 = <u,u,0,8,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512BW-NEXT: vextracti32x4 $2, %zmm0, %xmm1			; AVX512BW-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512BW-NEXT: vpextrb $8, %xmm1, %r10d			; AVX512BW-NEXT: vpshufb %xmm3, %xmm1, %xmm1
	; AVX512BW-NEXT: vpextrb $0, %xmm1, %r11d			; AVX512BW-NEXT: vpunpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
	; AVX512BW-NEXT: vextracti32x4 $1, %zmm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX512BW-NEXT: vpextrb $8, %xmm1, %eax			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm3 = <0,8,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512BW-NEXT: vpextrb $0, %xmm1, %ecx			; AVX512BW-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512BW-NEXT: vpextrb $8, %xmm0, %edx			; AVX512BW-NEXT: vpshufb %xmm3, %xmm0, %xmm0
	; AVX512BW-NEXT: vpextrb $0, %xmm0, %edi			; AVX512BW-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
	; AVX512BW-NEXT: vmovd %edi, %xmm0			; AVX512BW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
	; AVX512BW-NEXT: vpinsrb $1, %edx, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $2, %ecx, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $3, %eax, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $4, %r11d, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $5, %r10d, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $6, %r9d, %xmm0, %xmm0
	; AVX512BW-NEXT: vpinsrb $7, %r8d, %xmm0, %xmm0
	; AVX512BW-NEXT: vmovq %xmm0, (%rsi)			; AVX512BW-NEXT: vmovq %xmm0, (%rsi)
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: shuffle_v64i8_to_v8i8:			; AVX512BWVL-LABEL: shuffle_v64i8_to_v8i8:
	; AVX512BWVL: # BB#0:			; AVX512BWVL: # BB#0:
	; AVX512BWVL-NEXT: vmovdqu8 (%rdi), %zmm0			; AVX512BWVL-NEXT: vmovdqu8 (%rdi), %zmm0
	; AVX512BWVL-NEXT: vpextrb $8, %xmm0, %eax			; AVX512BWVL-NEXT: vextracti64x4 $1, %zmm0, %ymm1
	; AVX512BWVL-NEXT: vpextrb $0, %xmm0, %ecx			; AVX512BWVL-NEXT: vmovdqu {{.*#+}} ymm2 = <0,4,8,12,16,20,24,28,u,u,u,u,u,u,u,u>
	; AVX512BWVL-NEXT: vmovd %ecx, %xmm1			; AVX512BWVL-NEXT: vpermi2w %ymm1, %ymm0, %ymm2
	; AVX512BWVL-NEXT: vpinsrb $2, %eax, %xmm1, %xmm1			; AVX512BWVL-NEXT: vpmovwb %xmm2, (%rsi)
	; AVX512BWVL-NEXT: vextracti32x4 $1, %zmm0, %xmm2
	; AVX512BWVL-NEXT: vpextrb $0, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $4, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $8, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $6, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vextracti32x4 $2, %zmm0, %xmm2
	; AVX512BWVL-NEXT: vpextrb $0, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $8, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $8, %xmm2, %eax
	; AVX512BWVL-NEXT: vpinsrb $10, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vextracti32x4 $3, %zmm0, %xmm0
	; AVX512BWVL-NEXT: vpextrb $0, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrb $12, %eax, %xmm1, %xmm1
	; AVX512BWVL-NEXT: vpextrb $8, %xmm0, %eax
	; AVX512BWVL-NEXT: vpinsrb $14, %eax, %xmm1, %xmm0
	; AVX512BWVL-NEXT: vpmovwb %xmm0, (%rsi)
	; AVX512BWVL-NEXT: vzeroupper			; AVX512BWVL-NEXT: vzeroupper
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	%vec = load <64 x i8>, <64 x i8>* %L			%vec = load <64 x i8>, <64 x i8>* %L
	%strided.vec = shufflevector <64 x i8> %vec, <64 x i8> undef, <8 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56>			%strided.vec = shufflevector <64 x i8> %vec, <64 x i8> undef, <8 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56>
	store <8 x i8> %strided.vec, <8 x i8>* %S			store <8 x i8> %strided.vec, <8 x i8>* %S
	ret void			ret void
	}			}

	Show All 13 Lines

test/CodeGen/X86/vector-shuffle-512-v32.ll

	Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines
	; KNL-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]			; KNL-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; KNL-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[1,3,2,3,4,5,6,7]			; KNL-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[1,3,2,3,4,5,6,7]
	; KNL-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; KNL-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; KNL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]			; KNL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; KNL-NEXT: retq			; KNL-NEXT: retq
	;			;
	; SKX-LABEL: pr32967:			; SKX-LABEL: pr32967:
	; SKX: ## BB#0:			; SKX: ## BB#0:
	; SKX-NEXT: vpextrw $5, %xmm0, %eax			; SKX-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; SKX-NEXT: vpextrw $1, %xmm0, %ecx			; SKX-NEXT: vmovdqu {{.*#+}} ymm1 = <1,5,9,13,17,21,25,29,u,u,u,u,u,u,u,u>
	; SKX-NEXT: vmovd %ecx, %xmm1			; SKX-NEXT: vpermi2w %ymm2, %ymm0, %ymm1
	; SKX-NEXT: vpinsrw $1, %eax, %xmm1, %xmm1			; SKX-NEXT: vmovdqa %xmm1, %xmm0
	; SKX-NEXT: vextracti32x4 $1, %zmm0, %xmm2
	; SKX-NEXT: vpextrw $1, %xmm2, %eax
	; SKX-NEXT: vpinsrw $2, %eax, %xmm1, %xmm1
	; SKX-NEXT: vpextrw $5, %xmm2, %eax
	; SKX-NEXT: vpinsrw $3, %eax, %xmm1, %xmm1
	; SKX-NEXT: vextracti32x4 $2, %zmm0, %xmm2
	; SKX-NEXT: vpextrw $1, %xmm2, %eax
	; SKX-NEXT: vpinsrw $4, %eax, %xmm1, %xmm1
	; SKX-NEXT: vpblendw {{.*#+}} xmm1 = xmm1[0,1,2,3,4],xmm2[5],xmm1[6,7]
	; SKX-NEXT: vextracti32x4 $3, %zmm0, %xmm0
	; SKX-NEXT: vpextrw $1, %xmm0, %eax
	; SKX-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
	; SKX-NEXT: vpextrw $5, %xmm0, %eax
	; SKX-NEXT: vpinsrw $7, %eax, %xmm1, %xmm0
	; SKX-NEXT: vzeroupper			; SKX-NEXT: vzeroupper
	; SKX-NEXT: retq			; SKX-NEXT: retq
	%shuffle = shufflevector <32 x i16> %v, <32 x i16> undef, <8 x i32> <i32 1,i32 5,i32 9,i32 13,i32 17,i32 21,i32 25,i32 29>			%shuffle = shufflevector <32 x i16> %v, <32 x i16> undef, <8 x i32> <i32 1,i32 5,i32 9,i32 13,i32 17,i32 21,i32 25,i32 29>
	ret <8 x i16> %shuffle			ret <8 x i16> %shuffle
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

DAGCombine: Extend createBuildVecShuffle for case len(in_vec) = 4*len(result_vec)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 98257

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/vpadd.ll

test/CodeGen/X86/oddshuffles.ll

test/CodeGen/X86/shuffle-vs-trunc-512.ll

test/CodeGen/X86/vector-shuffle-512-v32.ll

DAGCombine: Extend createBuildVecShuffle for case len(in_vec) = 4*len(result_vec)
AbandonedPublic