This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
ClosedPublic

Authored by craig.topper on Jun 18 2019, 12:43 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon

Commits

rG4649a051bf0b: [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into…
rL364095: [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into…

Summary

128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Jun 18 2019, 12:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2019, 12:43 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

craig.topper marked 3 inline comments as done.Jun 18 2019, 12:53 PM

craig.topper added inline comments.

llvm/test/CodeGen/X86/avx-load-store.ll
243 ↗	(On Diff #205416)	This is because we emit a SUBREG_TO_REG+MOV for insert_subvector(zero) and our post processing peephole that removes the MOV when possible doesn't run at -O0.
llvm/test/CodeGen/X86/vec_extract-avx.ll
147 ↗	(On Diff #205416)	This is due to some inconsistencies between our handling of v4i64 vzmovl and v2i64 vzmovl. You'll see this same test case was changed by D63373 in a different way. The really weird thing here is that we're reducing the size of a load in an isel pattern. Which isn't good since we don't check if its volatile. We should remove these kinds of isel patterns and do something in DAG combine to move towards vzload.
llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
1507 ↗	(On Diff #205416)	This is also due to inconsistencies between v4f64 and v2f64 vzmovl handling. We also generate this same code after D63373

Rebase after D63373

LGTM

This revision is now accepted and ready to land.Jun 21 2019, 11:27 AM

LGTM

Closed by commit rL364095: [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into… (authored by ctopper). · Explain WhyJun 21 2019, 12:07 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

16 lines

X86InstrAVX512.td

38 lines

X86InstrSSE.td

19 lines

test/

CodeGen/

X86/

avx-load-store.ll

1 line

vec_extract-avx.ll

20 lines

vector-shuffle-256-v4.ll

1 line

Diff 206046

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 33,652 Lines • ▼ Show 20 Lines	if (N->getOpcode() == X86ISD::VZEXT_MOVL &&
case X86ISD::VFPROUND: case X86ISD::VMFPROUND:		case X86ISD::VFPROUND: case X86ISD::VMFPROUND:
if (In.getOperand(0).getValueType() == MVT::v2f64 \|\|		if (In.getOperand(0).getValueType() == MVT::v2f64 \|\|
In.getOperand(0).getValueType() == MVT::v2i64)		In.getOperand(0).getValueType() == MVT::v2i64)
return N->getOperand(0); // return the bitcast		return N->getOperand(0); // return the bitcast
break;		break;
}		}
}		}

		// Pull subvector inserts into undef through VZEXT_MOVL by making it an
		// insert into a zero vector. This helps get VZEXT_MOVL closer to
		// scalar_to_vectors where 256/512 are canonicalized to an insert and a
		// 128-bit scalar_to_vector. This reduces the number of isel patterns.
		if (N->getOpcode() == X86ISD::VZEXT_MOVL && !DCI.isBeforeLegalizeOps() &&
		N->getOperand(0).getOpcode() == ISD::INSERT_SUBVECTOR &&
		N->getOperand(0).hasOneUse() &&
		N->getOperand(0).getOperand(0).isUndef() &&
		isNullConstant(N->getOperand(0).getOperand(2))) {
		SDValue In = N->getOperand(0).getOperand(1);
		SDValue Movl = DAG.getNode(X86ISD::VZEXT_MOVL, dl, In.getValueType(), In);
		return DAG.getNode(ISD::INSERT_SUBVECTOR, dl, VT,
		getZeroVector(VT.getSimpleVT(), Subtarget, DAG, dl),
		Movl, N->getOperand(0).getOperand(2));
		}

// Look for a truncating shuffle to v2i32 of a PMULUDQ where one of the		// Look for a truncating shuffle to v2i32 of a PMULUDQ where one of the
// operands is an extend from v2i32 to v2i64. Turn it into a pmulld.		// operands is an extend from v2i32 to v2i64. Turn it into a pmulld.
// FIXME: This can probably go away once we default to widening legalization.		// FIXME: This can probably go away once we default to widening legalization.
if (Subtarget.hasSSE41() && VT == MVT::v4i32 &&		if (Subtarget.hasSSE41() && VT == MVT::v4i32 &&
N->getOpcode() == ISD::VECTOR_SHUFFLE &&		N->getOpcode() == ISD::VECTOR_SHUFFLE &&
N->getOperand(0).getOpcode() == ISD::BITCAST &&		N->getOperand(0).getOpcode() == ISD::BITCAST &&
N->getOperand(0).getOperand(0).getOpcode() == X86ISD::PMULUDQ) {		N->getOperand(0).getOperand(0).getOpcode() == X86ISD::PMULUDQ) {
SDValue BC = N->getOperand(0);		SDValue BC = N->getOperand(0);
▲ Show 20 Lines • Show All 11,378 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,323 Lines • ▼ Show 20 Lines	let Predicates = [HasAVX512] in {
// with SUBREG_TO_REG. The AVX versions also write: DST[255:128] <- 0		// with SUBREG_TO_REG. The AVX versions also write: DST[255:128] <- 0
def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector (loadf64 addr:$src))))),		def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector (loadf64 addr:$src))))),
(VMOVSDZrm addr:$src)>;		(VMOVSDZrm addr:$src)>;
def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),		def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),
(VMOVSDZrm addr:$src)>;		(VMOVSDZrm addr:$src)>;

// Represent the same patterns above but in the form they appear for		// Represent the same patterns above but in the form they appear for
// 256-bit types		// 256-bit types
def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrm addr:$src)), sub_xmm)>;
def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
(v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
def : Pat<(v8f32 (X86vzload addr:$src)),		def : Pat<(v8f32 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
(v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;
def : Pat<(v4f64 (X86vzload addr:$src)),		def : Pat<(v4f64 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;

// Represent the same patterns above but in the form they appear for		// Represent the same patterns above but in the form they appear for
// 512-bit types		// 512-bit types
def : Pat<(v16i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrm addr:$src)), sub_xmm)>;
def : Pat<(v16f32 (X86vzmovl (insert_subvector undef,
(v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
def : Pat<(v16f32 (X86vzload addr:$src)),		def : Pat<(v16f32 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
def : Pat<(v8f64 (X86vzmovl (insert_subvector undef,
(v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;
def : Pat<(v8f64 (X86vzload addr:$src)),		def : Pat<(v8f64 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;

def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
(v2i64 (scalar_to_vector (loadi64 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;
}		}

let ExeDomain = SSEPackedInt, SchedRW = [SchedWriteVecLogic.XMM] in {		let ExeDomain = SSEPackedInt, SchedRW = [SchedWriteVecLogic.XMM] in {
def VMOVZPQILo2PQIZrr : AVX512XSI<0x7E, MRMSrcReg, (outs VR128X:$dst),		def VMOVZPQILo2PQIZrr : AVX512XSI<0x7E, MRMSrcReg, (outs VR128X:$dst),
(ins VR128X:$src),		(ins VR128X:$src),
"vmovq\t{$src, $dst\|$dst, $src}",		"vmovq\t{$src, $dst\|$dst, $src}",
[(set VR128X:$dst, (v2i64 (X86vzmovl		[(set VR128X:$dst, (v2i64 (X86vzmovl
(v2i64 VR128X:$src))))]>,		(v2i64 VR128X:$src))))]>,
EVEX, VEX_W;		EVEX, VEX_W;
}		}

let Predicates = [HasAVX512] in {		let Predicates = [HasAVX512] in {
def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),		def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),
(VMOVDI2PDIZrr GR32:$src)>;		(VMOVDI2PDIZrr GR32:$src)>;

def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),		def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),
(VMOV64toPQIZrr GR64:$src)>;		(VMOV64toPQIZrr GR64:$src)>;

def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
(v2i64 (scalar_to_vector GR64:$src)),(iPTR 0)))),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOV64toPQIZrr GR64:$src)), sub_xmm)>;

def : Pat<(v8i64 (X86vzmovl (insert_subvector undef,
(v2i64 (scalar_to_vector GR64:$src)),(iPTR 0)))),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOV64toPQIZrr GR64:$src)), sub_xmm)>;

// AVX 128-bit movd/movq instruction write zeros in the high 128-bit part.		// AVX 128-bit movd/movq instruction write zeros in the high 128-bit part.
def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector (zextloadi64i32 addr:$src))))),		def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector (zextloadi64i32 addr:$src))))),
(VMOVDI2PDIZrm addr:$src)>;		(VMOVDI2PDIZrm addr:$src)>;
def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector (loadi32 addr:$src))))),		def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector (loadi32 addr:$src))))),
(VMOVDI2PDIZrm addr:$src)>;		(VMOVDI2PDIZrm addr:$src)>;
def : Pat<(v4i32 (X86vzmovl (loadv4i32 addr:$src))),		def : Pat<(v4i32 (X86vzmovl (loadv4i32 addr:$src))),
(VMOVDI2PDIZrm addr:$src)>;		(VMOVDI2PDIZrm addr:$src)>;
def : Pat<(v4i32 (X86vzload addr:$src)),		def : Pat<(v4i32 (X86vzload addr:$src)),
(VMOVDI2PDIZrm addr:$src)>;		(VMOVDI2PDIZrm addr:$src)>;
def : Pat<(v8i32 (X86vzload addr:$src)),		def : Pat<(v8i32 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrm addr:$src)), sub_xmm)>;
def : Pat<(v2i64 (X86vzmovl (loadv2i64 addr:$src))),		def : Pat<(v2i64 (X86vzmovl (loadv2i64 addr:$src))),
(VMOVQI2PQIZrm addr:$src)>;		(VMOVQI2PQIZrm addr:$src)>;
def : Pat<(v2f64 (X86vzmovl (v2f64 VR128X:$src))),		def : Pat<(v2f64 (X86vzmovl (v2f64 VR128X:$src))),
(VMOVZPQILo2PQIZrr VR128X:$src)>;		(VMOVZPQILo2PQIZrr VR128X:$src)>;
def : Pat<(v2i64 (X86vzload addr:$src)),		def : Pat<(v2i64 (X86vzload addr:$src)),
(VMOVQI2PQIZrm addr:$src)>;		(VMOVQI2PQIZrm addr:$src)>;
def : Pat<(v4i64 (X86vzload addr:$src)),		def : Pat<(v4i64 (X86vzload addr:$src)),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;

// Use regular 128-bit instructions to match 256-bit scalar_to_vec+zext.
def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector GR32:$src)),(iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrr GR32:$src)), sub_xmm)>;
def : Pat<(v16i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector GR32:$src)),(iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrr GR32:$src)), sub_xmm)>;

// Use regular 128-bit instructions to match 512-bit scalar_to_vec+zext.		// Use regular 128-bit instructions to match 512-bit scalar_to_vec+zext.
def : Pat<(v16i32 (X86vzload addr:$src)),		def : Pat<(v16i32 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIZrm addr:$src)), sub_xmm)>;
def : Pat<(v8i64 (X86vzload addr:$src)),		def : Pat<(v8i64 (X86vzload addr:$src)),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;

def : Pat<(v4f64 (X86vzmovl (v4f64 VR256X:$src))),		def : Pat<(v4f64 (X86vzmovl (v4f64 VR256X:$src))),
(SUBREG_TO_REG (i32 0),		(SUBREG_TO_REG (i32 0),
▲ Show 20 Lines • Show All 8,009 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector (loadf64 addr:$src))))),
(VMOVSDrm addr:$src)>;		(VMOVSDrm addr:$src)>;
def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),		def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),
(VMOVSDrm addr:$src)>;		(VMOVSDrm addr:$src)>;
def : Pat<(v2f64 (X86vzload addr:$src)),		def : Pat<(v2f64 (X86vzload addr:$src)),
(VMOVSDrm addr:$src)>;		(VMOVSDrm addr:$src)>;

// Represent the same patterns above but in the form they appear for		// Represent the same patterns above but in the form they appear for
// 256-bit types		// 256-bit types
def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
(v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSSrm addr:$src), sub_xmm)>;
def : Pat<(v8f32 (X86vzload addr:$src)),		def : Pat<(v8f32 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (VMOVSSrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSSrm addr:$src), sub_xmm)>;
def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
(v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSDrm addr:$src), sub_xmm)>;
def : Pat<(v4f64 (X86vzload addr:$src)),		def : Pat<(v4f64 (X86vzload addr:$src)),
(SUBREG_TO_REG (i32 0), (VMOVSDrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSDrm addr:$src), sub_xmm)>;
}		}

let Predicates = [UseAVX, OptForSize] in {		let Predicates = [UseAVX, OptForSize] in {
// Move scalar to XMM zero-extended, zeroing a VR128 then do a		// Move scalar to XMM zero-extended, zeroing a VR128 then do a
// MOVSS to the lower bits.		// MOVSS to the lower bits.
def : Pat<(v4f32 (X86vzmovl (v4f32 VR128:$src))),		def : Pat<(v4f32 (X86vzmovl (v4f32 VR128:$src))),
▲ Show 20 Lines • Show All 3,838 Lines • ▼ Show 20 Lines

let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),		def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),
(VMOVDI2PDIrr GR32:$src)>;		(VMOVDI2PDIrr GR32:$src)>;

def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),		def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),
(VMOV64toPQIrr GR64:$src)>;		(VMOV64toPQIrr GR64:$src)>;

def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
(v2i64 (scalar_to_vector GR64:$src)),(iPTR 0)))),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOV64toPQIrr GR64:$src)), sub_xmm)>;
// AVX 128-bit movd/movq instructions write zeros in the high 128-bit part.		// AVX 128-bit movd/movq instructions write zeros in the high 128-bit part.
// These instructions also write zeros in the high part of a 256-bit register.		// These instructions also write zeros in the high part of a 256-bit register.
def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector (zextloadi64i32 addr:$src))))),		def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector (zextloadi64i32 addr:$src))))),
(VMOVDI2PDIrm addr:$src)>;		(VMOVDI2PDIrm addr:$src)>;
def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector (loadi32 addr:$src))))),		def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector (loadi32 addr:$src))))),
(VMOVDI2PDIrm addr:$src)>;		(VMOVDI2PDIrm addr:$src)>;
def : Pat<(v4i32 (X86vzmovl (loadv4i32 addr:$src))),		def : Pat<(v4i32 (X86vzmovl (loadv4i32 addr:$src))),
(VMOVDI2PDIrm addr:$src)>;		(VMOVDI2PDIrm addr:$src)>;
def : Pat<(v4i32 (X86vzload addr:$src)),		def : Pat<(v4i32 (X86vzload addr:$src)),
(VMOVDI2PDIrm addr:$src)>;		(VMOVDI2PDIrm addr:$src)>;
def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIrm addr:$src)), sub_xmm)>;
def : Pat<(v8i32 (X86vzload addr:$src)),		def : Pat<(v8i32 (X86vzload addr:$src)),
(SUBREG_TO_REG (i64 0), (v4i32 (VMOVDI2PDIrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i64 0), (v4i32 (VMOVDI2PDIrm addr:$src)), sub_xmm)>;
// Use regular 128-bit instructions to match 256-bit scalar_to_vec+zext.
def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector GR32:$src)),(iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4i32 (VMOVDI2PDIrr GR32:$src)), sub_xmm)>;
}		}

let Predicates = [UseSSE2] in {		let Predicates = [UseSSE2] in {
def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),		def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),
(MOVDI2PDIrr GR32:$src)>;		(MOVDI2PDIrr GR32:$src)>;

def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),		def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),
(MOV64toPQIrr GR64:$src)>;		(MOV64toPQIrr GR64:$src)>;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
def : InstAlias<"movq.s\t{$src, $dst\|$dst, $src}",		def : InstAlias<"movq.s\t{$src, $dst\|$dst, $src}",
(MOVPQI2QIrr VR128:$dst, VR128:$src), 0>;		(MOVPQI2QIrr VR128:$dst, VR128:$src), 0>;

let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
def : Pat<(v2i64 (X86vzmovl (loadv2i64 addr:$src))),		def : Pat<(v2i64 (X86vzmovl (loadv2i64 addr:$src))),
(VMOVQI2PQIrm addr:$src)>;		(VMOVQI2PQIrm addr:$src)>;
def : Pat<(v2i64 (X86vzload addr:$src)),		def : Pat<(v2i64 (X86vzload addr:$src)),
(VMOVQI2PQIrm addr:$src)>;		(VMOVQI2PQIrm addr:$src)>;
def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
(v2i64 (scalar_to_vector (loadi64 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIrm addr:$src)), sub_xmm)>;
def : Pat<(v4i64 (X86vzload addr:$src)),		def : Pat<(v4i64 (X86vzload addr:$src)),
(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIrm addr:$src)), sub_xmm)>;

def : Pat<(X86vextractstore (v2i64 VR128:$src), addr:$dst),		def : Pat<(X86vextractstore (v2i64 VR128:$src), addr:$dst),
(VMOVPQI2QImr addr:$dst, VR128:$src)>;		(VMOVPQI2QImr addr:$dst, VR128:$src)>;
}		}

let Predicates = [UseSSE2] in {		let Predicates = [UseSSE2] in {
▲ Show 20 Lines • Show All 3,816 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx-load-store.ll

	Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines
	; CHECK_O0-NEXT: .LBB9_2: # %cif_mask_mixed			; CHECK_O0-NEXT: .LBB9_2: # %cif_mask_mixed
	; CHECK_O0-NEXT: # implicit-def: $al			; CHECK_O0-NEXT: # implicit-def: $al
	; CHECK_O0-NEXT: testb $1, %al			; CHECK_O0-NEXT: testb $1, %al
	; CHECK_O0-NEXT: jne .LBB9_3			; CHECK_O0-NEXT: jne .LBB9_3
	; CHECK_O0-NEXT: jmp .LBB9_4			; CHECK_O0-NEXT: jmp .LBB9_4
	; CHECK_O0-NEXT: .LBB9_3: # %cif_mixed_test_all			; CHECK_O0-NEXT: .LBB9_3: # %cif_mixed_test_all
	; CHECK_O0-NEXT: movl $-1, %eax			; CHECK_O0-NEXT: movl $-1, %eax
	; CHECK_O0-NEXT: vmovd %eax, %xmm0			; CHECK_O0-NEXT: vmovd %eax, %xmm0
				; CHECK_O0-NEXT: vmovdqa %xmm0, %xmm0
	; CHECK_O0-NEXT: vmovaps %xmm0, %xmm1			; CHECK_O0-NEXT: vmovaps %xmm0, %xmm1
	; CHECK_O0-NEXT: # implicit-def: $rcx			; CHECK_O0-NEXT: # implicit-def: $rcx
	; CHECK_O0-NEXT: # implicit-def: $ymm2			; CHECK_O0-NEXT: # implicit-def: $ymm2
	; CHECK_O0-NEXT: vmaskmovps %ymm2, %ymm1, (%rcx)			; CHECK_O0-NEXT: vmaskmovps %ymm2, %ymm1, (%rcx)
	; CHECK_O0-NEXT: .LBB9_4: # %cif_mixed_test_any_check			; CHECK_O0-NEXT: .LBB9_4: # %cif_mixed_test_any_check
	allocas:			allocas:
	br i1 undef, label %cif_mask_all, label %cif_mask_mixed			br i1 undef, label %cif_mask_all, label %cif_mask_mixed

	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_extract-avx.ll

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
ret void		ret void
}		}

define void @legal_vzmovl_2i64_4i64(<2 x i64>* %in, <4 x i64>* %out) {		define void @legal_vzmovl_2i64_4i64(<2 x i64>* %in, <4 x i64>* %out) {
; X32-LABEL: legal_vzmovl_2i64_4i64:		; X32-LABEL: legal_vzmovl_2i64_4i64:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: vmovdqu (%ecx), %xmm0		; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; X32-NEXT: vmovaps %ymm0, (%eax)
; X32-NEXT: vmovdqa %ymm0, (%eax)
; X32-NEXT: vzeroupper		; X32-NEXT: vzeroupper
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: legal_vzmovl_2i64_4i64:		; X64-LABEL: legal_vzmovl_2i64_4i64:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovdqu (%rdi), %xmm0		; X64-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X64-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; X64-NEXT: vmovaps %ymm0, (%rsi)
; X64-NEXT: vmovdqa %ymm0, (%rsi)
; X64-NEXT: vzeroupper		; X64-NEXT: vzeroupper
; X64-NEXT: retq		; X64-NEXT: retq
%ld = load <2 x i64>, <2 x i64>* %in, align 8		%ld = load <2 x i64>, <2 x i64>* %in, align 8
%ext = extractelement <2 x i64> %ld, i64 0		%ext = extractelement <2 x i64> %ld, i64 0
%ins = insertelement <4 x i64> <i64 undef, i64 0, i64 0, i64 0>, i64 %ext, i64 0		%ins = insertelement <4 x i64> <i64 undef, i64 0, i64 0, i64 0>, i64 %ext, i64 0
store <4 x i64> %ins, <4 x i64>* %out, align 32		store <4 x i64> %ins, <4 x i64>* %out, align 32
ret void		ret void
}		}
Show All 25 Lines	; X64-NEXT: retq
ret void		ret void
}		}

define void @legal_vzmovl_2f64_4f64(<2 x double>* %in, <4 x double>* %out) {		define void @legal_vzmovl_2f64_4f64(<2 x double>* %in, <4 x double>* %out) {
; X32-LABEL: legal_vzmovl_2f64_4f64:		; X32-LABEL: legal_vzmovl_2f64_4f64:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: vmovdqu (%ecx), %xmm0		; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X32-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; X32-NEXT: vmovaps %ymm0, (%eax)
; X32-NEXT: vmovdqa %ymm0, (%eax)
; X32-NEXT: vzeroupper		; X32-NEXT: vzeroupper
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: legal_vzmovl_2f64_4f64:		; X64-LABEL: legal_vzmovl_2f64_4f64:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovdqu (%rdi), %xmm0		; X64-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; X64-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; X64-NEXT: vmovaps %ymm0, (%rsi)
; X64-NEXT: vmovdqa %ymm0, (%rsi)
; X64-NEXT: vzeroupper		; X64-NEXT: vzeroupper
; X64-NEXT: retq		; X64-NEXT: retq
%ld = load <2 x double>, <2 x double>* %in, align 8		%ld = load <2 x double>, <2 x double>* %in, align 8
%ext = extractelement <2 x double> %ld, i64 0		%ext = extractelement <2 x double> %ld, i64 0
%ins = insertelement <4 x double> <double undef, double 0.0, double 0.0, double 0.0>, double %ext, i64 0		%ins = insertelement <4 x double> <double undef, double 0.0, double 0.0, double 0.0>, double %ext, i64 0
store <4 x double> %ins, <4 x double>* %out, align 32		store <4 x double> %ins, <4 x double>* %out, align 32
ret void		ret void
}		}

llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll

Show First 20 Lines • Show All 1,508 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%v = insertelement <4 x i64> undef, i64 %a, i64 0		%v = insertelement <4 x i64> undef, i64 %a, i64 0
%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x i64> %v, <4 x i64> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i64> %shuffle		ret <4 x i64> %shuffle
}		}

define <4 x double> @insert_reg_and_zero_v4f64(double %a) {		define <4 x double> @insert_reg_and_zero_v4f64(double %a) {
; ALL-LABEL: insert_reg_and_zero_v4f64:		; ALL-LABEL: insert_reg_and_zero_v4f64:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; ALL-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; ALL-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; ALL-NEXT: retq		; ALL-NEXT: retq
%v = insertelement <4 x double> undef, double %a, i32 0		%v = insertelement <4 x double> undef, double %a, i32 0
%shuffle = shufflevector <4 x double> %v, <4 x double> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%shuffle = shufflevector <4 x double> %v, <4 x double> zeroinitializer, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x double> %shuffle		ret <4 x double> %shuffle
}		}

define <4 x double> @insert_mem_and_zero_v4f64(double* %ptr) {		define <4 x double> @insert_mem_and_zero_v4f64(double* %ptr) {
▲ Show 20 Lines • Show All 528 Lines • Show Last 20 Lines