This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal
ClosedPublic

Authored by craig.topper on Oct 3 2019, 3:01 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel

Commits

rG570ae49d030c: [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8…
rL373864: [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8…

Summary

The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.

I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Oct 3 2019, 3:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2019, 3:01 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

craig.topper marked an inline comment as done.Oct 3 2019, 3:05 PM

craig.topper added inline comments.

llvm/test/CodeGen/X86/min-legal-vector-width.ll
836 ↗	(On Diff #223103)	The loss of the VPERMI2B here is a regression, but the VTRUNC form should allow us to create saturating VTRUNCs for the cases in vector-trunc-ssat.ll and vector-trunc-usat.ll. So maybe we need a late shuffle combine to VPERMI2B?

craig.topper added a child revision: D68432: [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC.Oct 3 2019, 3:49 PM

Rebase after landing the VTRUNCUS/VTRUNCS patch.

RKSimon added inline comments.Oct 5 2019, 9:37 AM

llvm/test/CodeGen/X86/min-legal-vector-width.ll
836 ↗	(On Diff #223103)	We don't currently support VTRUNC in shuffle combining as we're still weak at handling conflicting vector sizes - is that limitation ok for now?

craig.topper marked an inline comment as done.Oct 5 2019, 9:52 AM

craig.topper added inline comments.

llvm/test/CodeGen/X86/min-legal-vector-width.ll
836 ↗	(On Diff #223103)	I think so. We've only started shipping CPUs that support VPERMI2B last month I think so they aren't very widespread yet. If it becomes a problem we can probably match this specific pattern in a DAG combine. I'll open a bugzilla when this patch lands.

LGTM along with raising a bug about the VPERMI2B regression

This revision is now accepted and ready to land.Oct 6 2019, 9:58 AM

Closed by commit rL373864: [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8… (authored by ctopper). · Explain WhyOct 6 2019, 11:41 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

26 lines

test/

CodeGen/

X86/

min-legal-vector-width.ll

41 lines

vector-trunc-packus.ll

65 lines

vector-trunc-ssat.ll

59 lines

vector-trunc-usat.ll

50 lines

Diff 223436

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,757 Lines • ▼ Show 20 Lines	if (Subtarget.hasVBMI2()) {
MVT::v16i16, MVT::v8i32, MVT::v4i64 }) {		MVT::v16i16, MVT::v8i32, MVT::v4i64 }) {
setOperationAction(ISD::FSHL, VT, Custom);		setOperationAction(ISD::FSHL, VT, Custom);
setOperationAction(ISD::FSHR, VT, Custom);		setOperationAction(ISD::FSHR, VT, Custom);
}		}
}		}

setOperationAction(ISD::TRUNCATE, MVT::v16i32, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v16i32, Custom);
setOperationAction(ISD::TRUNCATE, MVT::v8i64, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v8i64, Custom);
		setOperationAction(ISD::TRUNCATE, MVT::v16i64, Custom);
}		}

// We want to custom lower some of our intrinsics.		// We want to custom lower some of our intrinsics.
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
if (!Subtarget.is64Bit()) {		if (!Subtarget.is64Bit()) {
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i64, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i64, Custom);
▲ Show 20 Lines • Show All 17,550 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const {
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
SDValue In = Op.getOperand(0);		SDValue In = Op.getOperand(0);
MVT InVT = In.getSimpleValueType();		MVT InVT = In.getSimpleValueType();
unsigned InNumEltBits = InVT.getScalarSizeInBits();		unsigned InNumEltBits = InVT.getScalarSizeInBits();

assert(VT.getVectorNumElements() == InVT.getVectorNumElements() &&		assert(VT.getVectorNumElements() == InVT.getVectorNumElements() &&
"Invalid TRUNCATE operation");		"Invalid TRUNCATE operation");

// If called by the legalizer just return.		// If we're called by the type legalizer, handle a few cases.
if (!DAG.getTargetLoweringInfo().isTypeLegal(InVT)) {		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if ((InVT == MVT::v8i64 \|\| InVT == MVT::v16i32) && VT.is128BitVector()) {		if (!TLI.isTypeLegal(InVT)) {
		if ((InVT == MVT::v8i64 \|\| InVT == MVT::v16i32 \|\| InVT == MVT::v16i64) &&
		VT.is128BitVector()) {
assert(Subtarget.hasVLX() && "Unexpected subtarget!");		assert(Subtarget.hasVLX() && "Unexpected subtarget!");
// The default behavior is to truncate one step, concatenate, and then		// The default behavior is to truncate one step, concatenate, and then
// truncate the remainder. We'd rather produce two 64-bit results and		// truncate the remainder. We'd rather produce two 64-bit results and
// concatenate those.		// concatenate those.
SDValue Lo, Hi;		SDValue Lo, Hi;
std::tie(Lo, Hi) = DAG.SplitVector(In, DL);		std::tie(Lo, Hi) = DAG.SplitVector(In, DL);

EVT LoVT, HiVT;		EVT LoVT, HiVT;
▲ Show 20 Lines • Show All 8,610 Lines • ▼ Show 20 Lines	if (Subtarget.hasAVX512() && isTypeLegal(InVT)) {
// There's one case we can widen to 512 bits and use VTRUNC.		// There's one case we can widen to 512 bits and use VTRUNC.
if (InVT == MVT::v4i64 && VT == MVT::v4i8 && isTypeLegal(MVT::v8i64)) {		if (InVT == MVT::v4i64 && VT == MVT::v4i8 && isTypeLegal(MVT::v8i64)) {
In = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v8i64, In,		In = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v8i64, In,
DAG.getUNDEF(MVT::v4i64));		DAG.getUNDEF(MVT::v4i64));
Results.push_back(DAG.getNode(X86ISD::VTRUNC, dl, WidenVT, In));		Results.push_back(DAG.getNode(X86ISD::VTRUNC, dl, WidenVT, In));
return;		return;
}		}
}		}
		if (Subtarget.hasVLX() && InVT == MVT::v8i64 && VT == MVT::v8i8 &&
		getTypeAction(*DAG.getContext(), InVT) == TypeSplitVector &&
		isTypeLegal(MVT::v4i64)) {
		// Input needs to be split and output needs to widened. Let's use two
		// VTRUNCs, and shuffle their results together into the wider type.
		SDValue Lo, Hi;
		std::tie(Lo, Hi) = DAG.SplitVector(In, dl);

		Lo = DAG.getNode(X86ISD::VTRUNC, dl, MVT::v16i8, Lo);
		Hi = DAG.getNode(X86ISD::VTRUNC, dl, MVT::v16i8, Hi);
		SDValue Res = DAG.getVectorShuffle(MVT::v16i8, dl, Lo, Hi,
		{ 0, 1, 2, 3, 16, 17, 18, 19,
		-1, -1, -1, -1, -1, -1, -1, -1 });
		Results.push_back(Res);
		return;
		}

return;		return;
}		}
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
// Right now, only MVT::v8i8 has Custom action for an illegal type.		// Right now, only MVT::v8i8 has Custom action for an illegal type.
// It's intended to custom handle the input type.		// It's intended to custom handle the input type.
assert(N->getValueType(0) == MVT::v8i8 &&		assert(N->getValueType(0) == MVT::v8i8 &&
"Do not know how to legalize this Node");		"Do not know how to legalize this Node");
return;		return;
▲ Show 20 Lines • Show All 17,864 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/min-legal-vector-width.ll

	Show First 20 Lines • Show All 791 Lines • ▼ Show 20 Lines

	define <16 x i8> @trunc_v16i64_v16i8(<16 x i64>* %x) nounwind "min-legal-vector-width"="256" {			define <16 x i8> @trunc_v16i64_v16i8(<16 x i64>* %x) nounwind "min-legal-vector-width"="256" {
	; CHECK-LABEL: trunc_v16i64_v16i8:			; CHECK-LABEL: trunc_v16i64_v16i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmovdqa (%rdi), %ymm0			; CHECK-NEXT: vmovdqa (%rdi), %ymm0
	; CHECK-NEXT: vmovdqa 32(%rdi), %ymm1			; CHECK-NEXT: vmovdqa 32(%rdi), %ymm1
	; CHECK-NEXT: vmovdqa 64(%rdi), %ymm2			; CHECK-NEXT: vmovdqa 64(%rdi), %ymm2
	; CHECK-NEXT: vmovdqa 96(%rdi), %ymm3			; CHECK-NEXT: vmovdqa 96(%rdi), %ymm3
	; CHECK-NEXT: vpmovqd %ymm2, %xmm2			; CHECK-NEXT: vpmovqb %ymm3, %xmm3
	; CHECK-NEXT: vpmovqd %ymm3, %xmm3			; CHECK-NEXT: vpmovqb %ymm2, %xmm2
	; CHECK-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2			; CHECK-NEXT: vpunpckldq {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1]
	; CHECK-NEXT: vpmovdb %ymm2, %xmm2			; CHECK-NEXT: vpmovqb %ymm1, %xmm1
	; CHECK-NEXT: vpmovqd %ymm0, %xmm0			; CHECK-NEXT: vpmovqb %ymm0, %xmm0
	; CHECK-NEXT: vpmovqd %ymm1, %xmm1			; CHECK-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; CHECK-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
	; CHECK-NEXT: vpmovdb %ymm0, %xmm0
	; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]			; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = load <16 x i64>, <16 x i64>* %x			%a = load <16 x i64>, <16 x i64>* %x
	%b = trunc <16 x i64> %a to <16 x i8>			%b = trunc <16 x i64> %a to <16 x i8>
	ret <16 x i8> %b			ret <16 x i8> %b
	}			}

	define <16 x i8> @trunc_v16i32_v16i8(<16 x i32>* %x) nounwind "min-legal-vector-width"="256" {			define <16 x i8> @trunc_v16i32_v16i8(<16 x i32>* %x) nounwind "min-legal-vector-width"="256" {
	; CHECK-LABEL: trunc_v16i32_v16i8:			; CHECK-LABEL: trunc_v16i32_v16i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmovdqa (%rdi), %ymm0			; CHECK-NEXT: vmovdqa (%rdi), %ymm0
	; CHECK-NEXT: vmovdqa 32(%rdi), %ymm1			; CHECK-NEXT: vmovdqa 32(%rdi), %ymm1
	; CHECK-NEXT: vpmovdb %ymm1, %xmm1			; CHECK-NEXT: vpmovdb %ymm1, %xmm1
	; CHECK-NEXT: vpmovdb %ymm0, %xmm0			; CHECK-NEXT: vpmovdb %ymm0, %xmm0
	; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = load <16 x i32>, <16 x i32>* %x			%a = load <16 x i32>, <16 x i32>* %x
	%b = trunc <16 x i32> %a to <16 x i8>			%b = trunc <16 x i32> %a to <16 x i8>
	ret <16 x i8> %b			ret <16 x i8> %b
	}			}

	define <8 x i8> @trunc_v8i64_v8i8(<8 x i64>* %x) nounwind "min-legal-vector-width"="256" {			define <8 x i8> @trunc_v8i64_v8i8(<8 x i64>* %x) nounwind "min-legal-vector-width"="256" {
	; CHECK-AVX512-LABEL: trunc_v8i64_v8i8:			; CHECK-LABEL: trunc_v8i64_v8i8:
	; CHECK-AVX512: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-AVX512-NEXT: vmovdqa (%rdi), %ymm0			; CHECK-NEXT: vmovdqa (%rdi), %ymm0
	; CHECK-AVX512-NEXT: vmovdqa 32(%rdi), %ymm1			; CHECK-NEXT: vmovdqa 32(%rdi), %ymm1
	; CHECK-AVX512-NEXT: vpmovqb %ymm1, %xmm1			; CHECK-NEXT: vpmovqb %ymm1, %xmm1
	; CHECK-AVX512-NEXT: vpmovqb %ymm0, %xmm0			; CHECK-NEXT: vpmovqb %ymm0, %xmm0
	; CHECK-AVX512-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; CHECK-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; CHECK-AVX512-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-AVX512-NEXT: retq			; CHECK-NEXT: retq
	;
	; CHECK-VBMI-LABEL: trunc_v8i64_v8i8:
	; CHECK-VBMI: # %bb.0:
	; CHECK-VBMI-NEXT: vmovdqa (%rdi), %ymm1
	; CHECK-VBMI-NEXT: vpbroadcastq {{.*#+}} ymm0 = [4048780183313844224,4048780183313844224,4048780183313844224,4048780183313844224]
	; CHECK-VBMI-NEXT: vpermi2b 32(%rdi), %ymm1, %ymm0
	; CHECK-VBMI-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; CHECK-VBMI-NEXT: vzeroupper
	; CHECK-VBMI-NEXT: retq
	%a = load <8 x i64>, <8 x i64>* %x			%a = load <8 x i64>, <8 x i64>* %x
	%b = trunc <8 x i64> %a to <8 x i8>			%b = trunc <8 x i64> %a to <8 x i8>
	ret <8 x i8> %b			ret <8 x i8> %b
	}			}

	define <8 x i16> @trunc_v8i64_v8i16(<8 x i64>* %x) nounwind "min-legal-vector-width"="256" {			define <8 x i16> @trunc_v8i64_v8i16(<8 x i64>* %x) nounwind "min-legal-vector-width"="256" {
	; CHECK-LABEL: trunc_v8i64_v8i16:			; CHECK-LABEL: trunc_v8i64_v8i16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll

	Show First 20 Lines • Show All 2,726 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX2-NEXT: vpackusdw %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpackusdw %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: trunc_packus_v16i64_v16i8:			; AVX512F-LABEL: trunc_packus_v16i64_v16i8:
	; AVX512: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512-NEXT: vpbroadcastq {{.*#+}} zmm2 = [255,255,255,255,255,255,255,255]			; AVX512F-NEXT: vpbroadcastq {{.*#+}} zmm2 = [255,255,255,255,255,255,255,255]
	; AVX512-NEXT: vpminsq %zmm2, %zmm0, %zmm0			; AVX512F-NEXT: vpminsq %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: vpminsq %zmm2, %zmm1, %zmm1			; AVX512F-NEXT: vpminsq %zmm2, %zmm1, %zmm1
	; AVX512-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1			; AVX512F-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1
	; AVX512-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0			; AVX512F-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: vpmovqd %zmm0, %ymm0			; AVX512F-NEXT: vpmovqd %zmm0, %ymm0
	; AVX512-NEXT: vpmovqd %zmm1, %ymm1			; AVX512F-NEXT: vpmovqd %zmm1, %ymm1
	; AVX512-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512F-NEXT: retq
				;
				; AVX512VL-LABEL: trunc_packus_v16i64_v16i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpxor %xmm2, %xmm2, %xmm2
				; AVX512VL-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1
				; AVX512VL-NEXT: vpmovusqb %zmm1, %xmm1
				; AVX512VL-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0
				; AVX512VL-NEXT: vpmovusqb %zmm0, %xmm0
				; AVX512VL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
				;
				; AVX512BW-LABEL: trunc_packus_v16i64_v16i8:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpbroadcastq {{.*#+}} zmm2 = [255,255,255,255,255,255,255,255]
				; AVX512BW-NEXT: vpminsq %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpminsq %zmm2, %zmm1, %zmm1
				; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
				; AVX512BW-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1
				; AVX512BW-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpmovqd %zmm0, %ymm0
				; AVX512BW-NEXT: vpmovqd %zmm1, %ymm1
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
				; AVX512BW-NEXT: vpmovdb %zmm0, %xmm0
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
				;
				; AVX512BWVL-LABEL: trunc_packus_v16i64_v16i8:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpxor %xmm2, %xmm2, %xmm2
				; AVX512BWVL-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1
				; AVX512BWVL-NEXT: vpmovusqb %zmm1, %xmm1
				; AVX512BWVL-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0
				; AVX512BWVL-NEXT: vpmovusqb %zmm0, %xmm0
				; AVX512BWVL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512BWVL-NEXT: vzeroupper
				; AVX512BWVL-NEXT: retq
	%1 = icmp slt <16 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>			%1 = icmp slt <16 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
	%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>			%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
	%3 = icmp sgt <16 x i64> %2, zeroinitializer			%3 = icmp sgt <16 x i64> %2, zeroinitializer
	%4 = select <16 x i1> %3, <16 x i64> %2, <16 x i64> zeroinitializer			%4 = select <16 x i1> %3, <16 x i64> %2, <16 x i64> zeroinitializer
	%5 = trunc <16 x i64> %4 to <16 x i8>			%5 = trunc <16 x i64> %4 to <16 x i8>
	ret <16 x i8> %5			ret <16 x i8> %5
	}			}

	▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll

	Show First 20 Lines • Show All 2,711 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX2-NEXT: vpackssdw %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpackssdw %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: trunc_ssat_v16i64_v16i8:			; AVX512F-LABEL: trunc_ssat_v16i64_v16i8:
	; AVX512: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512-NEXT: vpbroadcastq {{.*#+}} zmm2 = [127,127,127,127,127,127,127,127]			; AVX512F-NEXT: vpbroadcastq {{.*#+}} zmm2 = [127,127,127,127,127,127,127,127]
	; AVX512-NEXT: vpminsq %zmm2, %zmm0, %zmm0			; AVX512F-NEXT: vpminsq %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: vpminsq %zmm2, %zmm1, %zmm1			; AVX512F-NEXT: vpminsq %zmm2, %zmm1, %zmm1
	; AVX512-NEXT: vpbroadcastq {{.*#+}} zmm2 = [18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488]			; AVX512F-NEXT: vpbroadcastq {{.*#+}} zmm2 = [18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488]
	; AVX512-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1			; AVX512F-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1
	; AVX512-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0			; AVX512F-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: vpmovqd %zmm0, %ymm0			; AVX512F-NEXT: vpmovqd %zmm0, %ymm0
	; AVX512-NEXT: vpmovqd %zmm1, %ymm1			; AVX512F-NEXT: vpmovqd %zmm1, %ymm1
	; AVX512-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512F-NEXT: retq
				;
				; AVX512VL-LABEL: trunc_ssat_v16i64_v16i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpmovsqb %zmm1, %xmm1
				; AVX512VL-NEXT: vpmovsqb %zmm0, %xmm0
				; AVX512VL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
				;
				; AVX512BW-LABEL: trunc_ssat_v16i64_v16i8:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpbroadcastq {{.*#+}} zmm2 = [127,127,127,127,127,127,127,127]
				; AVX512BW-NEXT: vpminsq %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpminsq %zmm2, %zmm1, %zmm1
				; AVX512BW-NEXT: vpbroadcastq {{.*#+}} zmm2 = [18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488,18446744073709551488]
				; AVX512BW-NEXT: vpmaxsq %zmm2, %zmm1, %zmm1
				; AVX512BW-NEXT: vpmaxsq %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpmovqd %zmm0, %ymm0
				; AVX512BW-NEXT: vpmovqd %zmm1, %ymm1
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
				; AVX512BW-NEXT: vpmovdb %zmm0, %xmm0
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
				;
				; AVX512BWVL-LABEL: trunc_ssat_v16i64_v16i8:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpmovsqb %zmm1, %xmm1
				; AVX512BWVL-NEXT: vpmovsqb %zmm0, %xmm0
				; AVX512BWVL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512BWVL-NEXT: vzeroupper
				; AVX512BWVL-NEXT: retq
	%1 = icmp slt <16 x i64> %a0, <i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127>			%1 = icmp slt <16 x i64> %a0, <i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127>
	%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127>			%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127, i64 127>
	%3 = icmp sgt <16 x i64> %2, <i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128>			%3 = icmp sgt <16 x i64> %2, <i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128>
	%4 = select <16 x i1> %3, <16 x i64> %2, <16 x i64> <i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128>			%4 = select <16 x i1> %3, <16 x i64> %2, <16 x i64> <i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128, i64 -128>
	%5 = trunc <16 x i64> %4 to <16 x i8>			%5 = trunc <16 x i64> %4 to <16 x i8>
	ret <16 x i8> %5			ret <16 x i8> %5
	}			}

	▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll

	Show First 20 Lines • Show All 1,836 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX2-NEXT: vpackusdw %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpackusdw %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: trunc_usat_v16i64_v16i8:			; AVX512F-LABEL: trunc_usat_v16i64_v16i8:
	; AVX512: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512-NEXT: vpbroadcastq {{.*#+}} zmm2 = [255,255,255,255,255,255,255,255]			; AVX512F-NEXT: vpbroadcastq {{.*#+}} zmm2 = [255,255,255,255,255,255,255,255]
	; AVX512-NEXT: vpminuq %zmm2, %zmm1, %zmm1			; AVX512F-NEXT: vpminuq %zmm2, %zmm1, %zmm1
	; AVX512-NEXT: vpminuq %zmm2, %zmm0, %zmm0			; AVX512F-NEXT: vpminuq %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: vpmovqd %zmm0, %ymm0			; AVX512F-NEXT: vpmovqd %zmm0, %ymm0
	; AVX512-NEXT: vpmovqd %zmm1, %ymm1			; AVX512F-NEXT: vpmovqd %zmm1, %ymm1
	; AVX512-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmovdb %zmm0, %xmm0			; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
	; AVX512-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512F-NEXT: retq
				;
				; AVX512VL-LABEL: trunc_usat_v16i64_v16i8:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vpmovusqb %zmm1, %xmm1
				; AVX512VL-NEXT: vpmovusqb %zmm0, %xmm0
				; AVX512VL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512VL-NEXT: vzeroupper
				; AVX512VL-NEXT: retq
				;
				; AVX512BW-LABEL: trunc_usat_v16i64_v16i8:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpbroadcastq {{.*#+}} zmm2 = [255,255,255,255,255,255,255,255]
				; AVX512BW-NEXT: vpminuq %zmm2, %zmm1, %zmm1
				; AVX512BW-NEXT: vpminuq %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpmovqd %zmm0, %ymm0
				; AVX512BW-NEXT: vpmovqd %zmm1, %ymm1
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
				; AVX512BW-NEXT: vpmovdb %zmm0, %xmm0
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
				;
				; AVX512BWVL-LABEL: trunc_usat_v16i64_v16i8:
				; AVX512BWVL: # %bb.0:
				; AVX512BWVL-NEXT: vpmovusqb %zmm1, %xmm1
				; AVX512BWVL-NEXT: vpmovusqb %zmm0, %xmm0
				; AVX512BWVL-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512BWVL-NEXT: vzeroupper
				; AVX512BWVL-NEXT: retq
	%1 = icmp ult <16 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>			%1 = icmp ult <16 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
	%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>			%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
	%3 = trunc <16 x i64> %2 to <16 x i8>			%3 = trunc <16 x i64> %2 to <16 x i8>
	ret <16 x i8> %3			ret <16 x i8> %3
	}			}

	define <8 x i8> @trunc_usat_v8i32_v8i8(<8 x i32> %a0) {			define <8 x i8> @trunc_usat_v8i32_v8i8(<8 x i32> %a0) {
	; SSE2-LABEL: trunc_usat_v8i32_v8i8:			; SSE2-LABEL: trunc_usat_v8i32_v8i8:
	▲ Show 20 Lines • Show All 564 Lines • Show Last 20 Lines