This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns
ClosedPublic

Authored by craig.topper on Sep 18 2017, 11:32 PM.

Download Raw Diff

Details

Reviewers

RKSimon
zvi

Commits

rGc97775c03c1f: [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions…
rL315181: [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions…

Summary

We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.

Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.

Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.

This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.

The rest of the patch is removing all the excess scalar patterns.

Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Sep 18 2017, 11:32 PM

Making a note here as I think PR33079 was having problems with the MOVSS<->BLENDPS style commutation

@RKSimon do you have any concerns with this patch?

In D38023#880357, @craig.topper wrote:

@RKSimon do you have any concerns with this patch?

Partially - I'm just wondering how this might affect PR33079 (which funnily enough had a new report today).

What is the state of this now that D38449 has been committed?

Looks like this patch is still valid after D38449.

LGTM, please can you raise bugs for both custom handling of domain changes (PBLENDW <-> BLENDPS etc.) and adding isel patterns (optsize or not) for MOVSS/MOVSD with BLENDPS/BLENDPD?

This revision is now accepted and ready to land.Oct 8 2017, 3:32 AM

Closed by commit rL315181: [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions… (authored by ctopper). · Explain WhyOct 8 2017, 9:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

5 lines

X86InstrAVX512.td

24 lines

X86InstrSSE.td

64 lines

test/

CodeGen/

X86/

vector-shuffle-256-v8.ll

4 lines

Diff 118172

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,883 Lines • ▼ Show 20 Lines	if (!IsV1Zeroable) {
// the V1 elements can't be permuted in any way.		// the V1 elements can't be permuted in any way.
assert(VT == ExtVT && "Cannot change extended type when non-zeroable!");		assert(VT == ExtVT && "Cannot change extended type when non-zeroable!");
if (!VT.isFloatingPoint() \|\| V2Index != 0)		if (!VT.isFloatingPoint() \|\| V2Index != 0)
return SDValue();		return SDValue();
SmallVector<int, 8> V1Mask(Mask.begin(), Mask.end());		SmallVector<int, 8> V1Mask(Mask.begin(), Mask.end());
V1Mask[V2Index] = -1;		V1Mask[V2Index] = -1;
if (!isNoopShuffleMask(V1Mask))		if (!isNoopShuffleMask(V1Mask))
return SDValue();		return SDValue();
// This is essentially a special case blend operation, but if we have		if (!VT.is128BitVector())
// general purpose blend operations, they are always faster. Bail and let
// the rest of the lowering handle these as blends.
if (Subtarget.hasSSE41())
return SDValue();		return SDValue();

// Otherwise, use MOVSD or MOVSS.		// Otherwise, use MOVSD or MOVSS.
assert((EltVT == MVT::f32 \|\| EltVT == MVT::f64) &&		assert((EltVT == MVT::f32 \|\| EltVT == MVT::f64) &&
"Only two types of floating point element types to handle!");		"Only two types of floating point element types to handle!");
return DAG.getNode(EltVT == MVT::f32 ? X86ISD::MOVSS : X86ISD::MOVSD, DL,		return DAG.getNode(EltVT == MVT::f32 ? X86ISD::MOVSS : X86ISD::MOVSD, DL,
ExtVT, V1, V2);		ExtVT, V1, V2);
}		}
▲ Show 20 Lines • Show All 27,512 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,726 Lines • ▼ Show 20 Lines	multiclass AVX512_scalar_math_f32_patterns<SDNode Op, string OpcPrefix> {
let Predicates = [HasAVX512] in {		let Predicates = [HasAVX512] in {
// extracted scalar math op with insert via movss		// extracted scalar math op with insert via movss
def : Pat<(v4f32 (X86Movss (v4f32 VR128X:$dst), (v4f32 (scalar_to_vector		def : Pat<(v4f32 (X86Movss (v4f32 VR128X:$dst), (v4f32 (scalar_to_vector
(Op (f32 (extractelt (v4f32 VR128X:$dst), (iPTR 0))),		(Op (f32 (extractelt (v4f32 VR128X:$dst), (iPTR 0))),
FR32X:$src))))),		FR32X:$src))))),
(!cast<I>("V"#OpcPrefix#SSZrr_Int) v4f32:$dst,		(!cast<I>("V"#OpcPrefix#SSZrr_Int) v4f32:$dst,
(COPY_TO_REGCLASS FR32X:$src, VR128X))>;		(COPY_TO_REGCLASS FR32X:$src, VR128X))>;

// extracted scalar math op with insert via blend
def : Pat<(v4f32 (X86Blendi (v4f32 VR128X:$dst), (v4f32 (scalar_to_vector
(Op (f32 (extractelt (v4f32 VR128X:$dst), (iPTR 0))),
FR32X:$src))), (i8 1))),
(!cast<I>("V"#OpcPrefix#SSZrr_Int) v4f32:$dst,
(COPY_TO_REGCLASS FR32X:$src, VR128X))>;

// vector math op with insert via movss		// vector math op with insert via movss
def : Pat<(v4f32 (X86Movss (v4f32 VR128X:$dst),		def : Pat<(v4f32 (X86Movss (v4f32 VR128X:$dst),
(Op (v4f32 VR128X:$dst), (v4f32 VR128X:$src)))),		(Op (v4f32 VR128X:$dst), (v4f32 VR128X:$src)))),
(!cast<I>("V"#OpcPrefix#SSZrr_Int) v4f32:$dst, v4f32:$src)>;		(!cast<I>("V"#OpcPrefix#SSZrr_Int) v4f32:$dst, v4f32:$src)>;

// vector math op with insert via blend
def : Pat<(v4f32 (X86Blendi (v4f32 VR128X:$dst),
(Op (v4f32 VR128X:$dst), (v4f32 VR128X:$src)), (i8 1))),
(!cast<I>("V"#OpcPrefix#SSZrr_Int) v4f32:$dst, v4f32:$src)>;

// extracted masked scalar math op with insert via movss		// extracted masked scalar math op with insert via movss
def : Pat<(X86Movss (v4f32 VR128X:$src1),		def : Pat<(X86Movss (v4f32 VR128X:$src1),
(scalar_to_vector		(scalar_to_vector
(X86selects VK1WM:$mask,		(X86selects VK1WM:$mask,
(Op (f32 (extractelt (v4f32 VR128X:$src1), (iPTR 0))),		(Op (f32 (extractelt (v4f32 VR128X:$src1), (iPTR 0))),
FR32X:$src2),		FR32X:$src2),
FR32X:$src0))),		FR32X:$src0))),
(!cast<I>("V"#OpcPrefix#SSZrr_Intk) (COPY_TO_REGCLASS FR32X:$src0, VR128X),		(!cast<I>("V"#OpcPrefix#SSZrr_Intk) (COPY_TO_REGCLASS FR32X:$src0, VR128X),
Show All 11 Lines	multiclass AVX512_scalar_math_f64_patterns<SDNode Op, string OpcPrefix> {
let Predicates = [HasAVX512] in {		let Predicates = [HasAVX512] in {
// extracted scalar math op with insert via movsd		// extracted scalar math op with insert via movsd
def : Pat<(v2f64 (X86Movsd (v2f64 VR128X:$dst), (v2f64 (scalar_to_vector		def : Pat<(v2f64 (X86Movsd (v2f64 VR128X:$dst), (v2f64 (scalar_to_vector
(Op (f64 (extractelt (v2f64 VR128X:$dst), (iPTR 0))),		(Op (f64 (extractelt (v2f64 VR128X:$dst), (iPTR 0))),
FR64X:$src))))),		FR64X:$src))))),
(!cast<I>("V"#OpcPrefix#SDZrr_Int) v2f64:$dst,		(!cast<I>("V"#OpcPrefix#SDZrr_Int) v2f64:$dst,
(COPY_TO_REGCLASS FR64X:$src, VR128X))>;		(COPY_TO_REGCLASS FR64X:$src, VR128X))>;

// extracted scalar math op with insert via blend
def : Pat<(v2f64 (X86Blendi (v2f64 VR128X:$dst), (v2f64 (scalar_to_vector
(Op (f64 (extractelt (v2f64 VR128X:$dst), (iPTR 0))),
FR64X:$src))), (i8 1))),
(!cast<I>("V"#OpcPrefix#SDZrr_Int) v2f64:$dst,
(COPY_TO_REGCLASS FR64X:$src, VR128X))>;

// vector math op with insert via movsd		// vector math op with insert via movsd
def : Pat<(v2f64 (X86Movsd (v2f64 VR128X:$dst),		def : Pat<(v2f64 (X86Movsd (v2f64 VR128X:$dst),
(Op (v2f64 VR128X:$dst), (v2f64 VR128X:$src)))),		(Op (v2f64 VR128X:$dst), (v2f64 VR128X:$src)))),
(!cast<I>("V"#OpcPrefix#SDZrr_Int) v2f64:$dst, v2f64:$src)>;		(!cast<I>("V"#OpcPrefix#SDZrr_Int) v2f64:$dst, v2f64:$src)>;

// vector math op with insert via blend
def : Pat<(v2f64 (X86Blendi (v2f64 VR128X:$dst),
(Op (v2f64 VR128X:$dst), (v2f64 VR128X:$src)), (i8 1))),
(!cast<I>("V"#OpcPrefix#SDZrr_Int) v2f64:$dst, v2f64:$src)>;

// extracted masked scalar math op with insert via movss		// extracted masked scalar math op with insert via movss
def : Pat<(X86Movsd (v2f64 VR128X:$src1),		def : Pat<(X86Movsd (v2f64 VR128X:$src1),
(scalar_to_vector		(scalar_to_vector
(X86selects VK1WM:$mask,		(X86selects VK1WM:$mask,
(Op (f64 (extractelt (v2f64 VR128X:$src1), (iPTR 0))),		(Op (f64 (extractelt (v2f64 VR128X:$src1), (iPTR 0))),
FR64X:$src2),		FR64X:$src2),
FR64X:$src0))),		FR64X:$src0))),
(!cast<I>("V"#OpcPrefix#SDZrr_Intk) (COPY_TO_REGCLASS FR64X:$src0, VR128X),		(!cast<I>("V"#OpcPrefix#SDZrr_Intk) (COPY_TO_REGCLASS FR64X:$src0, VR128X),
Show All 9 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,905 Lines • ▼ Show 20 Lines	def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst), (v4f32 (scalar_to_vector
(COPY_TO_REGCLASS FR32:$src, VR128))>;		(COPY_TO_REGCLASS FR32:$src, VR128))>;

// vector math op with insert via movss		// vector math op with insert via movss
def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst),		def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst),
(Op (v4f32 VR128:$dst), (v4f32 VR128:$src)))),		(Op (v4f32 VR128:$dst), (v4f32 VR128:$src)))),
(!cast<I>(OpcPrefix#SSrr_Int) v4f32:$dst, v4f32:$src)>;		(!cast<I>(OpcPrefix#SSrr_Int) v4f32:$dst, v4f32:$src)>;
}		}

// With SSE 4.1, blendi is preferred to movsd, so match that too.
let Predicates = [UseSSE41] in {
// extracted scalar math op with insert via blend
def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector
(Op (f32 (extractelt (v4f32 VR128:$dst), (iPTR 0))),
FR32:$src))), (i8 1))),
(!cast<I>(OpcPrefix#SSrr_Int) v4f32:$dst,
(COPY_TO_REGCLASS FR32:$src, VR128))>;

// vector math op with insert via blend
def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
(Op (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
(!cast<I>(OpcPrefix#SSrr_Int)v4f32:$dst, v4f32:$src)>;

}

// Repeat everything for AVX.		// Repeat everything for AVX.
let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
// extracted scalar math op with insert via movss		// extracted scalar math op with insert via movss
def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst), (v4f32 (scalar_to_vector		def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst), (v4f32 (scalar_to_vector
(Op (f32 (extractelt (v4f32 VR128:$dst), (iPTR 0))),		(Op (f32 (extractelt (v4f32 VR128:$dst), (iPTR 0))),
FR32:$src))))),		FR32:$src))))),
(!cast<I>("V"#OpcPrefix#SSrr_Int) v4f32:$dst,		(!cast<I>("V"#OpcPrefix#SSrr_Int) v4f32:$dst,
(COPY_TO_REGCLASS FR32:$src, VR128))>;		(COPY_TO_REGCLASS FR32:$src, VR128))>;

// extracted scalar math op with insert via blend
def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector
(Op (f32 (extractelt (v4f32 VR128:$dst), (iPTR 0))),
FR32:$src))), (i8 1))),
(!cast<I>("V"#OpcPrefix#SSrr_Int) v4f32:$dst,
(COPY_TO_REGCLASS FR32:$src, VR128))>;

// vector math op with insert via movss		// vector math op with insert via movss
def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst),		def : Pat<(v4f32 (X86Movss (v4f32 VR128:$dst),
(Op (v4f32 VR128:$dst), (v4f32 VR128:$src)))),		(Op (v4f32 VR128:$dst), (v4f32 VR128:$src)))),
(!cast<I>("V"#OpcPrefix#SSrr_Int) v4f32:$dst, v4f32:$src)>;		(!cast<I>("V"#OpcPrefix#SSrr_Int) v4f32:$dst, v4f32:$src)>;

// vector math op with insert via blend
def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
(Op (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
(!cast<I>("V"#OpcPrefix#SSrr_Int) v4f32:$dst, v4f32:$src)>;
}		}
}		}

defm : scalar_math_f32_patterns<fadd, "ADD">;		defm : scalar_math_f32_patterns<fadd, "ADD">;
defm : scalar_math_f32_patterns<fsub, "SUB">;		defm : scalar_math_f32_patterns<fsub, "SUB">;
defm : scalar_math_f32_patterns<fmul, "MUL">;		defm : scalar_math_f32_patterns<fmul, "MUL">;
defm : scalar_math_f32_patterns<fdiv, "DIV">;		defm : scalar_math_f32_patterns<fdiv, "DIV">;

multiclass scalar_math_f64_patterns<SDNode Op, string OpcPrefix> {		multiclass scalar_math_f64_patterns<SDNode Op, string OpcPrefix> {
let Predicates = [UseSSE2] in {		let Predicates = [UseSSE2] in {
// extracted scalar math op with insert via movsd		// extracted scalar math op with insert via movsd
def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst), (v2f64 (scalar_to_vector		def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst), (v2f64 (scalar_to_vector
(Op (f64 (extractelt (v2f64 VR128:$dst), (iPTR 0))),		(Op (f64 (extractelt (v2f64 VR128:$dst), (iPTR 0))),
FR64:$src))))),		FR64:$src))))),
(!cast<I>(OpcPrefix#SDrr_Int) v2f64:$dst,		(!cast<I>(OpcPrefix#SDrr_Int) v2f64:$dst,
(COPY_TO_REGCLASS FR64:$src, VR128))>;		(COPY_TO_REGCLASS FR64:$src, VR128))>;

// vector math op with insert via movsd		// vector math op with insert via movsd
def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst),		def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst),
(Op (v2f64 VR128:$dst), (v2f64 VR128:$src)))),		(Op (v2f64 VR128:$dst), (v2f64 VR128:$src)))),
(!cast<I>(OpcPrefix#SDrr_Int) v2f64:$dst, v2f64:$src)>;		(!cast<I>(OpcPrefix#SDrr_Int) v2f64:$dst, v2f64:$src)>;
}		}

// With SSE 4.1, blendi is preferred to movsd, so match those too.
let Predicates = [UseSSE41] in {
// extracted scalar math op with insert via blend
def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector
(Op (f64 (extractelt (v2f64 VR128:$dst), (iPTR 0))),
FR64:$src))), (i8 1))),
(!cast<I>(OpcPrefix#SDrr_Int) v2f64:$dst,
(COPY_TO_REGCLASS FR64:$src, VR128))>;

// vector math op with insert via blend
def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
(Op (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
(!cast<I>(OpcPrefix#SDrr_Int) v2f64:$dst, v2f64:$src)>;
}

// Repeat everything for AVX.		// Repeat everything for AVX.
let Predicates = [UseAVX] in {		let Predicates = [UseAVX] in {
// extracted scalar math op with insert via movsd		// extracted scalar math op with insert via movsd
def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst), (v2f64 (scalar_to_vector		def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst), (v2f64 (scalar_to_vector
(Op (f64 (extractelt (v2f64 VR128:$dst), (iPTR 0))),		(Op (f64 (extractelt (v2f64 VR128:$dst), (iPTR 0))),
FR64:$src))))),		FR64:$src))))),
(!cast<I>("V"#OpcPrefix#SDrr_Int) v2f64:$dst,		(!cast<I>("V"#OpcPrefix#SDrr_Int) v2f64:$dst,
(COPY_TO_REGCLASS FR64:$src, VR128))>;		(COPY_TO_REGCLASS FR64:$src, VR128))>;

// extracted scalar math op with insert via blend
def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector
(Op (f64 (extractelt (v2f64 VR128:$dst), (iPTR 0))),
FR64:$src))), (i8 1))),
(!cast<I>("V"#OpcPrefix#SDrr_Int) v2f64:$dst,
(COPY_TO_REGCLASS FR64:$src, VR128))>;

// vector math op with insert via movsd		// vector math op with insert via movsd
def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst),		def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst),
(Op (v2f64 VR128:$dst), (v2f64 VR128:$src)))),		(Op (v2f64 VR128:$dst), (v2f64 VR128:$src)))),
(!cast<I>("V"#OpcPrefix#SDrr_Int) v2f64:$dst, v2f64:$src)>;		(!cast<I>("V"#OpcPrefix#SDrr_Int) v2f64:$dst, v2f64:$src)>;

// vector math op with insert via blend
def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
(Op (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
(!cast<I>("V"#OpcPrefix#SDrr_Int) v2f64:$dst, v2f64:$src)>;
}		}
}		}

defm : scalar_math_f64_patterns<fadd, "ADD">;		defm : scalar_math_f64_patterns<fadd, "ADD">;
defm : scalar_math_f64_patterns<fsub, "SUB">;		defm : scalar_math_f64_patterns<fsub, "SUB">;
defm : scalar_math_f64_patterns<fmul, "MUL">;		defm : scalar_math_f64_patterns<fmul, "MUL">;
defm : scalar_math_f64_patterns<fdiv, "DIV">;		defm : scalar_math_f64_patterns<fdiv, "DIV">;

▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
multiclass scalar_unary_math_patterns<Intrinsic Intr, string OpcPrefix,		multiclass scalar_unary_math_patterns<Intrinsic Intr, string OpcPrefix,
SDNode Move, ValueType VT,		SDNode Move, ValueType VT,
Predicate BasePredicate> {		Predicate BasePredicate> {
let Predicates = [BasePredicate] in {		let Predicates = [BasePredicate] in {
def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),		def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),
(!cast<I>(OpcPrefix#r_Int) VT:$dst, VT:$src)>;		(!cast<I>(OpcPrefix#r_Int) VT:$dst, VT:$src)>;
}		}

// With SSE 4.1, blendi is preferred to movs*, so match that too.
let Predicates = [UseSSE41] in {
def : Pat<(VT (X86Blendi VT:$dst, (Intr VT:$src), (i8 1))),
(!cast<I>(OpcPrefix#r_Int) VT:$dst, VT:$src)>;
}

// Repeat for AVX versions of the instructions.		// Repeat for AVX versions of the instructions.
let Predicates = [HasAVX] in {		let Predicates = [HasAVX] in {
def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),		def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),
(!cast<I>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src)>;		(!cast<I>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src)>;

def : Pat<(VT (X86Blendi VT:$dst, (Intr VT:$src), (i8 1))),
(!cast<I>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src)>;
}		}
}		}

defm : scalar_unary_math_patterns<int_x86_sse_rcp_ss, "RCPSS", X86Movss,		defm : scalar_unary_math_patterns<int_x86_sse_rcp_ss, "RCPSS", X86Movss,
v4f32, UseSSE1>;		v4f32, UseSSE1>;
defm : scalar_unary_math_patterns<int_x86_sse_rsqrt_ss, "RSQRTSS", X86Movss,		defm : scalar_unary_math_patterns<int_x86_sse_rsqrt_ss, "RSQRTSS", X86Movss,
v4f32, UseSSE1>;		v4f32, UseSSE1>;
defm : scalar_unary_math_patterns<int_x86_sse_sqrt_ss, "SQRTSS", X86Movss,		defm : scalar_unary_math_patterns<int_x86_sse_sqrt_ss, "SQRTSS", X86Movss,
▲ Show 20 Lines • Show All 5,108 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v8.ll

Show First 20 Lines • Show All 1,214 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retq
ret <8 x i32> %shuffle		ret <8 x i32> %shuffle
}		}

define <8 x i32> @shuffle_v8i32_08991abb(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @shuffle_v8i32_08991abb(<8 x i32> %a, <8 x i32> %b) {
; AVX1-LABEL: shuffle_v8i32_08991abb:		; AVX1-LABEL: shuffle_v8i32_08991abb:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,0],xmm1[0,0]		; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,0],xmm1[0,0]
; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[1,1]		; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[1,1]
; AVX1-NEXT: vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]		; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,2,3,3]		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,2,3,3]
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: shuffle_v8i32_08991abb:		; AVX2-LABEL: shuffle_v8i32_08991abb:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vmovdqa {{.*#+}} ymm2 = <u,0,1,1,u,2,3,3>		; AVX2-NEXT: vmovdqa {{.*#+}} ymm2 = <u,0,1,1,u,2,3,3>
; AVX2-NEXT: vpermd %ymm1, %ymm2, %ymm1		; AVX2-NEXT: vpermd %ymm1, %ymm2, %ymm1
; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero		; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
▲ Show 20 Lines • Show All 1,038 Lines • Show Last 20 Lines