This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
ClosedPublic

Authored by craig.topper on Jul 12 2018, 10:08 PM.

Download Raw Diff

Details

Reviewers

Commits

rG28ac623f6f24: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
rL337590: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.

Summary

Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.

Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Jul 12 2018, 10:08 PM

Harbormaster completed remote builds in B20342: Diff 155321.Jul 12 2018, 10:09 PM

The domain-switching in the tests is expected, correct? From the deleted comment I gather this is intentional. Do you have any performance measurement of this change? Not that I necessarily object, I'm just curious. I'm surprised only two tests changed from a use of movsd and no tests changes from a use of movss. Makes me a bit concerned about our test coverage.

greened added inline comments.Jul 13 2018, 8:07 AM

lib/Target/X86/X86ISelLowering.cpp
30736 ↗	(On Diff #155321)	It would be worth replicating this comment somewhere, wherever this decision is made, I suppose. It's helpful to know when such things are intentional.

The domain switching isn't really intentiional. The code I removed from X86ISelLowering was removing some bitcasts that made this optimize better.

What used to happen is this
-Lowering selects an fp typed MOVSS/MOVSD for the first shuffle and a PSHUFD for the second shuffle. The MOVSS/SD has bitcasts int->fp before it and a bitcast fp->int after it.
-Combining sees the MOVSS/SD and the bitcasts int->fp and turns into an integer MOVSS/MOVSD ISD node. This removes the bitcasts before and after it.
-Combining sees the PSHUFD producer has integer type and does nothing to chang it.

What happens now is
-Lowering selects an fp typed MOVSS/MOVSD for the first shuffle and a PSHUFD for the second shuffle. The MOVSS/SD has bitcasts int->fp before it and a bitcast fp->int after it.
-Combining sees the MOVSS/SD and the bitcasts int->fp, but does nothing to change it.
-Combining sees the PSHUFD, looks through the fp->int bitcast and finds the FP typed MOVSS/SD. Decides to rewrite the PSHUFD to SHUFPS to remove the fp->int bitcast, but ends up creating a new fp->int bitcast after the SHUFPS. The subsequent shuffles are on v8i16 type and so can't be fixed so the fp->int bitcast after the SHUFPS stays.

I don't have perf numbers. Based on the changes I think we're just moving the domain crossing one instruction later. And we picked up a move. The move is probably more costly on older CPUs that don't have move elimination.

I'm hoping there's somethign we can do in DAG combine to fix this. Maybe recognize the bitcast+shufps from fp->int and change it to pshufd+bitcast? Need to be careful to avoid an infinite loop with teh combine that turns pshud+bitcast into bitcast+shufps

Ok, thanks for the explanation, that helps. It sounds like no additional domain switching issues have been introduced, so this seems fine to me as far as that goes.

The extra moves are a little concerning. Even with move elimination they still consume processor resources. I'm not sure that trade-off is worth it just
to eliminate two really simple TableGen patterns.

EDIT: I realized the above is a bit unfair. It's getting rid of a fair amount of custom lowering code and eliminating the MOVS patterns in multiple files. It also does shrink the first test by a couple of instructions but overall, more instructions (in the form of movs) are introduced than eliminated.

I'm not really objecting to this, just wishing we had some hard numbers to look at.

It's 2 patterns repeated 3 times. And as of yesterday these patterns are all qualified with "optsize || !sse41) and we have new patterns to select blend under (sse41 & !optsize). This fixes a long standing issue where we turned movss/sd into blend later in the pipeline through an accidental double call to commuteInstruction. So with that fixed its now 12 patterns that can be removed. I need to rebase this patch.

I believe we're also missing a pattern to turn (v2f64 (movsd (v2f64), (loadv2f64)) into movlpd(probably really movlps since that's 1 byte shorter and identical in behavior). That would also need to be repeated for v2i64 for SSE, AVX, and AVX512

I think the blend patterns I added last night should also have load versions for consistency unless we want to just depend on the peephole pass.

I think if we were to fix those issues and not take this patch we would have 21 extra patterns. Which I guess is still not a huge number compared to the total number of patterns

I really hope we can find a different way to fix the DAG combiner. Simon is the expert on that code and I'm hoping he has an idea.

There might be more we can do to encourage lowering/combines to domain swappable style instruuction patterns (shufd+punpck etc.) - plus I've raised https://bugs.llvm.org/show_bug.cgi?id=38157 to see what we can do to do more exotic domain swaps (SHUFPD/SHUFPS <-> PSHUFD etc.).

Glad to hear the blend commute has been fixed!

I believe we're also missing a pattern to turn (v2f64 (movsd (v2f64), (loadv2f64)) into movlpd(probably really movlps since that's 1 byte shorter and identical in behavior).

But not necessarily identical in performance. Some microarchitectures don't like ps/pd domain shifting like that.

Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?

In D49280#1162909, @craig.topper wrote:

Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?

It tends to be only the 'weird mixture' PS/PD domain shifts that cause a stall: VADDPS then VMULPD, that kind of thing - shuffles and bitops tend to be more forgiving (and more easy to fix.)

In D49280#1163648, @RKSimon wrote:

In D49280#1162909, @craig.topper wrote:

Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?

It tends to be only the 'weird mixture' PS/PD domain shifts that cause a stall: VADDPS then VMULPD, that kind of thing - shuffles and bitops tend to be more forgiving (and more easy to fix.)

http://www.agner.org/optimize/microarchitecture.pdf, "21 AMD Bobcat and Jaguar pipeline", page 222:

21.9 Data delay between differently typed instructions
...
There is a penalty of 40 clock cycles when the output of a floating point calculation is input
to a floating point calculation with a different precision, for example if the output of a double
precision floating point addition is input to a single precision addition. This has hardly any
practical significance since such a sequence is most likely to be a programming error, but it
indicates that the processor stores extra information about floating point numbers beyond
the 128 bits in an XMM register.

In D49280#1163648, @RKSimon wrote:

In D49280#1162909, @craig.topper wrote:

Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?

It tends to be only the 'weird mixture' PS/PD domain shifts that cause a stall: VADDPS then VMULPD, that kind of thing - shuffles and bitops tend to be more forgiving (and more easy to fix.)

The AMD 17h Optimization Guide has this to say:

"Try to use consistent data types for instructions operating on the same data. For example, use
VANDPS, VMAXPS, and so on when consuming the output of MULPS."

The inclusion of VANDPS in that list makes me think all such domain crossings may incur a penalty.
I have not seen an explicit list anywhere of what is and is not bad. A VADDPD feeding a VANDPS
is probably not a programming error but it would incur a penalty.

In D49280#1163872, @greened wrote:

In D49280#1163648, @RKSimon wrote:

In D49280#1162909, @craig.topper wrote:

Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?

It tends to be only the 'weird mixture' PS/PD domain shifts that cause a stall: VADDPS then VMULPD, that kind of thing - shuffles and bitops tend to be more forgiving (and more easy to fix.)

The AMD 17h Optimization Guide has this to say:

"Try to use consistent data types for instructions operating on the same data. For example, use
VANDPS, VMAXPS, and so on when consuming the output of MULPS."

The inclusion of VANDPS in that list makes me think all such domain crossings may incur a penalty.
I have not seen an explicit list anywhere of what is and is not bad. A VADDPD feeding a VANDPS
is probably not a programming error but it would incur a penalty.

We do our hardest to keep to the same domain already - like I said on PR38157 bitops and (some) unpacks/shuffles are already handled but we can always do more (patches welcome!).

test/CodeGen/X86/oddshuffles.ll
1332 ↗	(On Diff #155321)	TBH, this would be better if we stayed purely in the float domain.

RKSimon mentioned this in D49474: [X86][SSE] Canonicalize scalar fp arithmetic shuffle patterns.Jul 18 2018, 4:43 AM

Rebase

Rebase again

LGTM - this isn't introducing any more domain switches than what is already present., although as you mentioned there is still room for improvement.

This revision is now accepted and ready to land.Jul 20 2018, 3:24 AM

Ditto. Just wanted to clarify that I think this is fine. Thanks for a good discussion!

Closed by commit rL337590: [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types. (authored by ctopper). · Explain WhyJul 20 2018, 11:03 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

26 lines

X86InstrAVX512.td

10 lines

X86InstrFragmentsSIMD.td

10 lines

X86InstrSSE.td

41 lines

test/

CodeGen/

X86/

oddshuffles.ll

72 lines

vector-shift-ashr-128.ll

30 lines

vector-shift-lshr-128.ll

30 lines

vector-shuffle-128-v8.ll

17 lines

Diff 156547

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 29,394 Lines • ▼ Show 20 Lines	if (isTargetShuffleEquivalent(Mask, {1, 1}) && AllowFloatDomain) {
Shuffle = Subtarget.hasSSE2() ? X86ISD::UNPCKH : X86ISD::MOVHLPS;		Shuffle = Subtarget.hasSSE2() ? X86ISD::UNPCKH : X86ISD::MOVHLPS;
SrcVT = DstVT = Subtarget.hasSSE2() ? MVT::v2f64 : MVT::v4f32;		SrcVT = DstVT = Subtarget.hasSSE2() ? MVT::v2f64 : MVT::v4f32;
return true;		return true;
}		}
if (isTargetShuffleEquivalent(Mask, {0, 3}) && Subtarget.hasSSE2() &&		if (isTargetShuffleEquivalent(Mask, {0, 3}) && Subtarget.hasSSE2() &&
(AllowFloatDomain \|\| !Subtarget.hasSSE41())) {		(AllowFloatDomain \|\| !Subtarget.hasSSE41())) {
std::swap(V1, V2);		std::swap(V1, V2);
Shuffle = X86ISD::MOVSD;		Shuffle = X86ISD::MOVSD;
SrcVT = DstVT = MaskVT;		SrcVT = DstVT = MVT::v2f64;
return true;		return true;
}		}
if (isTargetShuffleEquivalent(Mask, {4, 1, 2, 3}) &&		if (isTargetShuffleEquivalent(Mask, {4, 1, 2, 3}) &&
(AllowFloatDomain \|\| !Subtarget.hasSSE41())) {		(AllowFloatDomain \|\| !Subtarget.hasSSE41())) {
Shuffle = X86ISD::MOVSS;		Shuffle = X86ISD::MOVSS;
SrcVT = DstVT = MaskVT;		SrcVT = DstVT = MVT::v4f32;
return true;		return true;
}		}
}		}

// Attempt to match against either a unary or binary PACKSS/PACKUS shuffle.		// Attempt to match against either a unary or binary PACKSS/PACKUS shuffle.
// TODO add support for 256/512-bit types.		// TODO add support for 256/512-bit types.
if ((MaskVT == MVT::v8i16 \|\| MaskVT == MVT::v16i8) && Subtarget.hasSSE2()) {		if ((MaskVT == MVT::v8i16 \|\| MaskVT == MVT::v16i8) && Subtarget.hasSSE2()) {
if (matchVectorShuffleWithPACK(MaskVT, SrcVT, V1, V2, Shuffle, Mask, DAG,		if (matchVectorShuffleWithPACK(MaskVT, SrcVT, V1, V2, Shuffle, Mask, DAG,
▲ Show 20 Lines • Show All 1,292 Lines • ▼ Show 20 Lines	if (Opcode1 == ISD::FADD \|\| Opcode1 == ISD::FMUL \|\| Opcode1 == ISD::FSUB \|\|
DCI.AddToWorklist(N10.getNode());		DCI.AddToWorklist(N10.getNode());
DCI.AddToWorklist(N11.getNode());		DCI.AddToWorklist(N11.getNode());
DCI.AddToWorklist(Scl.getNode());		DCI.AddToWorklist(Scl.getNode());
DCI.AddToWorklist(SclVec.getNode());		DCI.AddToWorklist(SclVec.getNode());
return DAG.getNode(Opcode, DL, VT, N0, SclVec);		return DAG.getNode(Opcode, DL, VT, N0, SclVec);
}		}
}		}

SDValue V0 = peekThroughBitcasts(N0);
SDValue V1 = peekThroughBitcasts(N1);
bool isZero0 = ISD::isBuildVectorAllZeros(V0.getNode());
bool isZero1 = ISD::isBuildVectorAllZeros(V1.getNode());
if (isZero0 && isZero1)
return SDValue();

// We often lower to MOVSD/MOVSS from integer as well as native float
// types; remove unnecessary domain-crossing bitcasts if we can to make it
// easier to combine shuffles later on. We've already accounted for the
// domain switching cost when we decided to lower with it.
bool isFloat = VT.isFloatingPoint();
bool isFloat0 = V0.getSimpleValueType().isFloatingPoint();
bool isFloat1 = V1.getSimpleValueType().isFloatingPoint();
if ((isFloat != isFloat0 \|\| isZero0) && (isFloat != isFloat1 \|\| isZero1)) {
MVT NewVT = isFloat ? (X86ISD::MOVSD == Opcode ? MVT::v2i64 : MVT::v4i32)
: (X86ISD::MOVSD == Opcode ? MVT::v2f64 : MVT::v4f32);
V0 = DAG.getBitcast(NewVT, V0);
V1 = DAG.getBitcast(NewVT, V1);
return DAG.getBitcast(VT, DAG.getNode(Opcode, DL, NewVT, V0, V1));
}

return SDValue();		return SDValue();
}		}
case X86ISD::INSERTPS: {		case X86ISD::INSERTPS: {
assert(VT == MVT::v4f32 && "INSERTPS ValueType must be MVT::v4f32");		assert(VT == MVT::v4f32 && "INSERTPS ValueType must be MVT::v4f32");
SDValue Op0 = N.getOperand(0);		SDValue Op0 = N.getOperand(0);
SDValue Op1 = N.getOperand(1);		SDValue Op1 = N.getOperand(1);
SDValue Op2 = N.getOperand(2);		SDValue Op2 = N.getOperand(2);
unsigned InsertPSMask = cast<ConstantSDNode>(Op2)->getZExtValue();		unsigned InsertPSMask = cast<ConstantSDNode>(Op2)->getZExtValue();
▲ Show 20 Lines • Show All 9,845 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,462 Lines • ▼ Show 20 Lines	def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;		(SUBREG_TO_REG (i64 0), (v2i64 (VMOVQI2PQIZrm addr:$src)), sub_xmm)>;

// Extract and store.		// Extract and store.
def : Pat<(store (f32 (extractelt (v4f32 VR128X:$src), (iPTR 0))),		def : Pat<(store (f32 (extractelt (v4f32 VR128X:$src), (iPTR 0))),
addr:$dst),		addr:$dst),
(VMOVSSZmr addr:$dst, (COPY_TO_REGCLASS (v4f32 VR128X:$src), FR32X))>;		(VMOVSSZmr addr:$dst, (COPY_TO_REGCLASS (v4f32 VR128X:$src), FR32X))>;
}		}

let Predicates = [HasAVX512, OptForSize] in {
// Shuffle with VMOVSS
def : Pat<(v4i32 (X86Movss VR128X:$src1, VR128X:$src2)),
(VMOVSSZrr (v4i32 VR128X:$src1), VR128X:$src2)>;

// Shuffle with VMOVSD
def : Pat<(v2i64 (X86Movsd VR128X:$src1, VR128X:$src2)),
(VMOVSDZrr VR128X:$src1, VR128X:$src2)>;
}

let ExeDomain = SSEPackedInt, SchedRW = [SchedWriteVecLogic.XMM] in {		let ExeDomain = SSEPackedInt, SchedRW = [SchedWriteVecLogic.XMM] in {
def VMOVZPQILo2PQIZrr : AVX512XSI<0x7E, MRMSrcReg, (outs VR128X:$dst),		def VMOVZPQILo2PQIZrr : AVX512XSI<0x7E, MRMSrcReg, (outs VR128X:$dst),
(ins VR128X:$src),		(ins VR128X:$src),
"vmovq\t{$src, $dst\|$dst, $src}",		"vmovq\t{$src, $dst\|$dst, $src}",
[(set VR128X:$dst, (v2i64 (X86vzmovl		[(set VR128X:$dst, (v2i64 (X86vzmovl
(v2i64 VR128X:$src))))]>,		(v2i64 VR128X:$src))))]>,
EVEX, VEX_W;		EVEX, VEX_W;
}		}
▲ Show 20 Lines • Show All 7,488 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	def X86insertqi : SDNode<"X86ISD::INSERTQI",
SDTCisVT<4, i8>]>>;		SDTCisVT<4, i8>]>>;

// Specific shuffle nodes - At some point ISD::VECTOR_SHUFFLE will always get		// Specific shuffle nodes - At some point ISD::VECTOR_SHUFFLE will always get
// translated into one of the target nodes below during lowering.		// translated into one of the target nodes below during lowering.
// Note: this is a work in progress...		// Note: this is a work in progress...
def SDTShuff1Op : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;		def SDTShuff1Op : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
def SDTShuff2Op : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,		def SDTShuff2Op : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
SDTCisSameAs<0,2>]>;		SDTCisSameAs<0,2>]>;
		def SDTShuff2OpFP : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisFP<0>,
		SDTCisSameAs<0,1>, SDTCisSameAs<0,2>]>;

def SDTShuff2OpM : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,		def SDTShuff2OpM : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
SDTCisFP<0>, SDTCisInt<2>,		SDTCisFP<0>, SDTCisInt<2>,
SDTCisSameNumEltsAs<0,2>,		SDTCisSameNumEltsAs<0,2>,
SDTCisSameSizeAs<0,2>]>;		SDTCisSameSizeAs<0,2>]>;
def SDTShuff2OpI : SDTypeProfile<1, 2, [SDTCisVec<0>,		def SDTShuff2OpI : SDTypeProfile<1, 2, [SDTCisVec<0>,
SDTCisSameAs<0,1>, SDTCisVT<2, i8>]>;		SDTCisSameAs<0,1>, SDTCisVT<2, i8>]>;
def SDTShuff3OpI : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,		def SDTShuff3OpI : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines

def X86Shufp : SDNode<"X86ISD::SHUFP", SDTShuff3OpI>;		def X86Shufp : SDNode<"X86ISD::SHUFP", SDTShuff3OpI>;
def X86Shuf128 : SDNode<"X86ISD::SHUF128", SDTShuff3OpI>;		def X86Shuf128 : SDNode<"X86ISD::SHUF128", SDTShuff3OpI>;

def X86Movddup : SDNode<"X86ISD::MOVDDUP", SDTShuff1Op>;		def X86Movddup : SDNode<"X86ISD::MOVDDUP", SDTShuff1Op>;
def X86Movshdup : SDNode<"X86ISD::MOVSHDUP", SDTShuff1Op>;		def X86Movshdup : SDNode<"X86ISD::MOVSHDUP", SDTShuff1Op>;
def X86Movsldup : SDNode<"X86ISD::MOVSLDUP", SDTShuff1Op>;		def X86Movsldup : SDNode<"X86ISD::MOVSLDUP", SDTShuff1Op>;

def X86Movsd : SDNode<"X86ISD::MOVSD", SDTShuff2Op>;		def X86Movsd : SDNode<"X86ISD::MOVSD", SDTShuff2OpFP>;
def X86Movss : SDNode<"X86ISD::MOVSS", SDTShuff2Op>;		def X86Movss : SDNode<"X86ISD::MOVSS", SDTShuff2OpFP>;

def X86Movlhps : SDNode<"X86ISD::MOVLHPS", SDTShuff2Op>;		def X86Movlhps : SDNode<"X86ISD::MOVLHPS", SDTShuff2OpFP>;
def X86Movhlps : SDNode<"X86ISD::MOVHLPS", SDTShuff2Op>;		def X86Movhlps : SDNode<"X86ISD::MOVHLPS", SDTShuff2OpFP>;

def SDTPack : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<0>,		def SDTPack : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<0>,
SDTCisVec<1>, SDTCisInt<1>,		SDTCisVec<1>, SDTCisInt<1>,
SDTCisSameSizeAs<0,1>,		SDTCisSameSizeAs<0,1>,
SDTCisSameAs<1,2>,		SDTCisSameAs<1,2>,
SDTCisOpSmallerThanOp<0, 1>]>;		SDTCisOpSmallerThanOp<0, 1>]>;
def X86Packss : SDNode<"X86ISD::PACKSS", SDTPack>;		def X86Packss : SDNode<"X86ISD::PACKSS", SDTPack>;
def X86Packus : SDNode<"X86ISD::PACKUS", SDTPack>;		def X86Packus : SDNode<"X86ISD::PACKUS", SDTPack>;
▲ Show 20 Lines • Show All 690 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	def : Pat<(v4f64 (X86vzmovl (v4f64 VR256:$src))),
(v2f64 (VMOVSDrr (v2f64 (V_SET0)),		(v2f64 (VMOVSDrr (v2f64 (V_SET0)),
(v2f64 (EXTRACT_SUBREG (v4f64 VR256:$src), sub_xmm)))),		(v2f64 (EXTRACT_SUBREG (v4f64 VR256:$src), sub_xmm)))),
sub_xmm)>;		sub_xmm)>;
def : Pat<(v4i64 (X86vzmovl (v4i64 VR256:$src))),		def : Pat<(v4i64 (X86vzmovl (v4i64 VR256:$src))),
(SUBREG_TO_REG (i32 0),		(SUBREG_TO_REG (i32 0),
(v2i64 (VMOVSDrr (v2i64 (V_SET0)),		(v2i64 (VMOVSDrr (v2i64 (V_SET0)),
(v2i64 (EXTRACT_SUBREG (v4i64 VR256:$src), sub_xmm)))),		(v2i64 (EXTRACT_SUBREG (v4i64 VR256:$src), sub_xmm)))),
sub_xmm)>;		sub_xmm)>;

// Shuffle with VMOVSS
def : Pat<(v4i32 (X86Movss VR128:$src1, VR128:$src2)),
(VMOVSSrr VR128:$src1, VR128:$src2)>;

// Shuffle with VMOVSD
def : Pat<(v2i64 (X86Movsd VR128:$src1, VR128:$src2)),
(VMOVSDrr VR128:$src1, VR128:$src2)>;
}		}

let Predicates = [UseSSE1] in {		let Predicates = [UseSSE1] in {
let Predicates = [UseSSE1, NoSSE41_Or_OptForSize] in {		let Predicates = [UseSSE1, NoSSE41_Or_OptForSize] in {
// Move scalar to XMM zero-extended, zeroing a VR128 then do a		// Move scalar to XMM zero-extended, zeroing a VR128 then do a
// MOVSS to the lower bits.		// MOVSS to the lower bits.
def : Pat<(v4f32 (X86vzmovl (v4f32 VR128:$src))),		def : Pat<(v4f32 (X86vzmovl (v4f32 VR128:$src))),
(MOVSSrr (v4f32 (V_SET0)), VR128:$src)>;		(MOVSSrr (v4f32 (V_SET0)), VR128:$src)>;
def : Pat<(v4i32 (X86vzmovl (v4i32 VR128:$src))),		def : Pat<(v4i32 (X86vzmovl (v4i32 VR128:$src))),
(MOVSSrr (v4i32 (V_SET0)), VR128:$src)>;		(MOVSSrr (v4i32 (V_SET0)), VR128:$src)>;
// Shuffle with MOVSS
def : Pat<(v4i32 (X86Movss VR128:$src1, VR128:$src2)),
(MOVSSrr VR128:$src1, VR128:$src2)>;
}		}

// MOVSSrm already zeros the high parts of the register.		// MOVSSrm already zeros the high parts of the register.
def : Pat<(v4f32 (X86vzmovl (v4f32 (scalar_to_vector (loadf32 addr:$src))))),		def : Pat<(v4f32 (X86vzmovl (v4f32 (scalar_to_vector (loadf32 addr:$src))))),
(COPY_TO_REGCLASS (MOVSSrm addr:$src), VR128)>;		(COPY_TO_REGCLASS (MOVSSrm addr:$src), VR128)>;
def : Pat<(v4f32 (X86vzmovl (loadv4f32 addr:$src))),		def : Pat<(v4f32 (X86vzmovl (loadv4f32 addr:$src))),
(COPY_TO_REGCLASS (MOVSSrm addr:$src), VR128)>;		(COPY_TO_REGCLASS (MOVSSrm addr:$src), VR128)>;
def : Pat<(v4f32 (X86vzload addr:$src)),		def : Pat<(v4f32 (X86vzload addr:$src)),
Show All 10 Lines	let Predicates = [UseSSE2] in {
def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector (loadf64 addr:$src))))),		def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector (loadf64 addr:$src))))),
(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;		(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;
def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),		def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),
(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;		(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;
def : Pat<(v2f64 (X86vzmovl (bc_v2f64 (loadv4f32 addr:$src)))),		def : Pat<(v2f64 (X86vzmovl (bc_v2f64 (loadv4f32 addr:$src)))),
(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;		(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;
def : Pat<(v2f64 (X86vzload addr:$src)),		def : Pat<(v2f64 (X86vzload addr:$src)),
(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;		(COPY_TO_REGCLASS (MOVSDrm addr:$src), VR128)>;

let Predicates = [UseSSE2, NoSSE41_Or_OptForSize] in {
// Shuffle with MOVSD
def : Pat<(v2i64 (X86Movsd VR128:$src1, VR128:$src2)),
(MOVSDrr VR128:$src1, VR128:$src2)>;
}
}		}

// Aliases to help the assembler pick two byte VEX encodings by swapping the		// Aliases to help the assembler pick two byte VEX encodings by swapping the
// operands relative to the normal instructions to use VEX.R instead of VEX.B.		// operands relative to the normal instructions to use VEX.R instead of VEX.B.
def : InstAlias<"vmovss\t{$src2, $src1, $dst\|$dst, $src1, $src2}",		def : InstAlias<"vmovss\t{$src2, $src1, $dst\|$dst, $src1, $src2}",
(VMOVSSrr_REV VR128L:$dst, VR128:$src1, VR128H:$src2), 0>;		(VMOVSSrr_REV VR128L:$dst, VR128:$src1, VR128H:$src2), 0>;
def : InstAlias<"vmovsd\t{$src2, $src1, $dst\|$dst, $src1, $src2}",		def : InstAlias<"vmovsd\t{$src2, $src1, $dst\|$dst, $src1, $src2}",
(VMOVSDrr_REV VR128L:$dst, VR128:$src1, VR128H:$src2), 0>;		(VMOVSDrr_REV VR128L:$dst, VR128:$src1, VR128H:$src2), 0>;
▲ Show 20 Lines • Show All 6,041 Lines • ▼ Show 20 Lines	def : Pat<(v4i32 (X86vzmovl (v4i32 VR128:$src))),
(VPBLENDWrri (v4i32 (V_SET0)), VR128:$src, (i8 3))>;		(VPBLENDWrri (v4i32 (V_SET0)), VR128:$src, (i8 3))>;

def : Pat<(v4f32 (X86Movss VR128:$src1, VR128:$src2)),		def : Pat<(v4f32 (X86Movss VR128:$src1, VR128:$src2)),
(VBLENDPSrri VR128:$src1, VR128:$src2, (i8 1))>;		(VBLENDPSrri VR128:$src1, VR128:$src2, (i8 1))>;
def : Pat<(v4f32 (X86Movss VR128:$src1, (loadv4f32 addr:$src2))),		def : Pat<(v4f32 (X86Movss VR128:$src1, (loadv4f32 addr:$src2))),
(VBLENDPSrmi VR128:$src1, addr:$src2, (i8 1))>;		(VBLENDPSrmi VR128:$src1, addr:$src2, (i8 1))>;
def : Pat<(v4f32 (X86Movss (loadv4f32 addr:$src2), VR128:$src1)),		def : Pat<(v4f32 (X86Movss (loadv4f32 addr:$src2), VR128:$src1)),
(VBLENDPSrmi VR128:$src1, addr:$src2, (i8 0xe))>;		(VBLENDPSrmi VR128:$src1, addr:$src2, (i8 0xe))>;
def : Pat<(v4i32 (X86Movss VR128:$src1, VR128:$src2)),
(VPBLENDWrri VR128:$src1, VR128:$src2, (i8 3))>;
def : Pat<(v4i32 (X86Movss VR128:$src1, (bc_v4i32 (loadv2i64 addr:$src2)))),
(VPBLENDWrmi VR128:$src1, addr:$src2, (i8 3))>;
def : Pat<(v4i32 (X86Movss (bc_v4i32 (loadv2i64 addr:$src2)), VR128:$src1)),
(VPBLENDWrmi VR128:$src1, addr:$src2, (i8 0xfc))>;

def : Pat<(v2f64 (X86Movsd VR128:$src1, VR128:$src2)),		def : Pat<(v2f64 (X86Movsd VR128:$src1, VR128:$src2)),
(VBLENDPDrri VR128:$src1, VR128:$src2, (i8 1))>;		(VBLENDPDrri VR128:$src1, VR128:$src2, (i8 1))>;
def : Pat<(v2f64 (X86Movsd VR128:$src1, (loadv2f64 addr:$src2))),		def : Pat<(v2f64 (X86Movsd VR128:$src1, (loadv2f64 addr:$src2))),
(VBLENDPDrmi VR128:$src1, addr:$src2, (i8 1))>;		(VBLENDPDrmi VR128:$src1, addr:$src2, (i8 1))>;
def : Pat<(v2f64 (X86Movsd (loadv2f64 addr:$src2), VR128:$src1)),		def : Pat<(v2f64 (X86Movsd (loadv2f64 addr:$src2), VR128:$src1)),
(VBLENDPDrmi VR128:$src1, addr:$src2, (i8 2))>;		(VBLENDPDrmi VR128:$src1, addr:$src2, (i8 2))>;
def : Pat<(v2i64 (X86Movsd VR128:$src1, VR128:$src2)),
(VPBLENDWrri VR128:$src1, VR128:$src2, (i8 0xf))>;
def : Pat<(v2i64 (X86Movsd VR128:$src1, (loadv2i64 addr:$src2))),
(VPBLENDWrmi VR128:$src1, addr:$src2, (i8 0xf))>;
def : Pat<(v2i64 (X86Movsd (loadv2i64 addr:$src2), VR128:$src1)),
(VPBLENDWrmi VR128:$src1, addr:$src2, (i8 0xf0))>;

// Move low f32 and clear high bits.		// Move low f32 and clear high bits.
def : Pat<(v8f32 (X86vzmovl (v8f32 VR256:$src))),		def : Pat<(v8f32 (X86vzmovl (v8f32 VR256:$src))),
(SUBREG_TO_REG (i32 0),		(SUBREG_TO_REG (i32 0),
(v4f32 (VBLENDPSrri (v4f32 (V_SET0)),		(v4f32 (VBLENDPSrri (v4f32 (V_SET0)),
(v4f32 (EXTRACT_SUBREG (v8f32 VR256:$src), sub_xmm)),		(v4f32 (EXTRACT_SUBREG (v8f32 VR256:$src), sub_xmm)),
(i8 1))), sub_xmm)>;		(i8 1))), sub_xmm)>;
def : Pat<(v8i32 (X86vzmovl (v8i32 VR256:$src))),		def : Pat<(v8i32 (X86vzmovl (v8i32 VR256:$src))),
Show All 25 Lines	def : Pat<(v4i32 (X86vzmovl (v4i32 VR128:$src))),
(PBLENDWrri (v4i32 (V_SET0)), VR128:$src, (i8 3))>;		(PBLENDWrri (v4i32 (V_SET0)), VR128:$src, (i8 3))>;

def : Pat<(v4f32 (X86Movss VR128:$src1, VR128:$src2)),		def : Pat<(v4f32 (X86Movss VR128:$src1, VR128:$src2)),
(BLENDPSrri VR128:$src1, VR128:$src2, (i8 1))>;		(BLENDPSrri VR128:$src1, VR128:$src2, (i8 1))>;
def : Pat<(v4f32 (X86Movss VR128:$src1, (memopv4f32 addr:$src2))),		def : Pat<(v4f32 (X86Movss VR128:$src1, (memopv4f32 addr:$src2))),
(BLENDPSrmi VR128:$src1, addr:$src2, (i8 1))>;		(BLENDPSrmi VR128:$src1, addr:$src2, (i8 1))>;
def : Pat<(v4f32 (X86Movss (memopv4f32 addr:$src2), VR128:$src1)),		def : Pat<(v4f32 (X86Movss (memopv4f32 addr:$src2), VR128:$src1)),
(BLENDPSrmi VR128:$src1, addr:$src2, (i8 0xe))>;		(BLENDPSrmi VR128:$src1, addr:$src2, (i8 0xe))>;
def : Pat<(v4i32 (X86Movss VR128:$src1, VR128:$src2)),
(PBLENDWrri VR128:$src1, VR128:$src2, (i8 3))>;
def : Pat<(v4i32 (X86Movss VR128:$src1, (bc_v4i32 (memopv2i64 addr:$src2)))),
(PBLENDWrmi VR128:$src1, addr:$src2, (i8 3))>;
def : Pat<(v4i32 (X86Movss (bc_v4i32 (memopv2i64 addr:$src2)), VR128:$src1)),
(PBLENDWrmi VR128:$src1, addr:$src2, (i8 0xfc))>;

def : Pat<(v2f64 (X86Movsd VR128:$src1, VR128:$src2)),		def : Pat<(v2f64 (X86Movsd VR128:$src1, VR128:$src2)),
(BLENDPDrri VR128:$src1, VR128:$src2, (i8 1))>;		(BLENDPDrri VR128:$src1, VR128:$src2, (i8 1))>;
def : Pat<(v2f64 (X86Movsd VR128:$src1, (memopv2f64 addr:$src2))),		def : Pat<(v2f64 (X86Movsd VR128:$src1, (memopv2f64 addr:$src2))),
(BLENDPDrmi VR128:$src1, addr:$src2, (i8 1))>;		(BLENDPDrmi VR128:$src1, addr:$src2, (i8 1))>;
def : Pat<(v2f64 (X86Movsd (memopv2f64 addr:$src2), VR128:$src1)),		def : Pat<(v2f64 (X86Movsd (memopv2f64 addr:$src2), VR128:$src1)),
(BLENDPDrmi VR128:$src1, addr:$src2, (i8 2))>;		(BLENDPDrmi VR128:$src1, addr:$src2, (i8 2))>;
def : Pat<(v2i64 (X86Movsd VR128:$src1, VR128:$src2)),
(PBLENDWrri VR128:$src1, VR128:$src2, (i8 0xf))>;
def : Pat<(v2i64 (X86Movsd VR128:$src1, (memopv2i64 addr:$src2))),
(PBLENDWrmi VR128:$src1, addr:$src2, (i8 0xf))>;
def : Pat<(v2i64 (X86Movsd (memopv2i64 addr:$src2), VR128:$src1)),
(PBLENDWrmi VR128:$src1, addr:$src2, (i8 0xf0))>;
}		}


/// SS41I_ternary_int - SSE 4.1 ternary operator		/// SS41I_ternary_int - SSE 4.1 ternary operator
let Uses = [XMM0], Constraints = "$src1 = $dst" in {		let Uses = [XMM0], Constraints = "$src1 = $dst" in {
multiclass SS41I_ternary_int<bits<8> opc, string OpcodeStr, PatFrag mem_frag,		multiclass SS41I_ternary_int<bits<8> opc, string OpcodeStr, PatFrag mem_frag,
X86MemOperand x86memop, Intrinsic IntId,		X86MemOperand x86memop, Intrinsic IntId,
X86FoldableSchedWrite sched> {		X86FoldableSchedWrite sched> {
▲ Show 20 Lines • Show All 1,777 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/oddshuffles.ll

Show First 20 Lines • Show All 1,271 Lines • ▼ Show 20 Lines	; XOP-NEXT: retq
%interleaved = shufflevector <16 x i16> %t1, <16 x i16> %t2, <24 x i32> <i32 0, i32 8, i32 16, i32 1, i32 9, i32 17, i32 2, i32 10, i32 18, i32 3, i32 11, i32 19, i32 4, i32 12, i32 20, i32 5, i32 13, i32 21, i32 6, i32 14, i32 22, i32 7, i32 15, i32 23>		%interleaved = shufflevector <16 x i16> %t1, <16 x i16> %t2, <24 x i32> <i32 0, i32 8, i32 16, i32 1, i32 9, i32 17, i32 2, i32 10, i32 18, i32 3, i32 11, i32 19, i32 4, i32 12, i32 20, i32 5, i32 13, i32 21, i32 6, i32 14, i32 22, i32 7, i32 15, i32 23>
store <24 x i16> %interleaved, <24 x i16>* %p, align 4		store <24 x i16> %interleaved, <24 x i16>* %p, align 4
ret void		ret void
}		}

define void @interleave_24i32_out(<24 x i32>* %p, <8 x i32>* %q1, <8 x i32>* %q2, <8 x i32>* %q3) nounwind {		define void @interleave_24i32_out(<24 x i32>* %p, <8 x i32>* %q1, <8 x i32>* %q2, <8 x i32>* %q3) nounwind {
; SSE2-LABEL: interleave_24i32_out:		; SSE2-LABEL: interleave_24i32_out:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movups 80(%rdi), %xmm5		; SSE2-NEXT: movups 80(%rdi), %xmm9
; SSE2-NEXT: movups 64(%rdi), %xmm8		; SSE2-NEXT: movups 64(%rdi), %xmm10
; SSE2-NEXT: movups (%rdi), %xmm0		; SSE2-NEXT: movups (%rdi), %xmm0
; SSE2-NEXT: movups 16(%rdi), %xmm6		; SSE2-NEXT: movups 16(%rdi), %xmm11
; SSE2-NEXT: movups 32(%rdi), %xmm2		; SSE2-NEXT: movups 32(%rdi), %xmm8
; SSE2-NEXT: movups 48(%rdi), %xmm1		; SSE2-NEXT: movups 48(%rdi), %xmm2
; SSE2-NEXT: movaps %xmm1, %xmm3		; SSE2-NEXT: movaps %xmm2, %xmm3
; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,3],xmm8[2,3]		; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,3],xmm10[2,3]
; SSE2-NEXT: movaps %xmm5, %xmm4		; SSE2-NEXT: pshufd {{.*#+}} xmm7 = xmm2[2,3,0,1]
; SSE2-NEXT: shufps {{.*#+}} xmm4 = xmm4[1,0],xmm3[2,0]		; SSE2-NEXT: movaps %xmm9, %xmm6
; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,1],xmm4[2,0]		; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm10[1,1,2,3]
; SSE2-NEXT: movaps %xmm0, %xmm4		; SSE2-NEXT: punpckldq {{.*#+}} xmm7 = xmm7[0],xmm5[0],xmm7[1],xmm5[1]
; SSE2-NEXT: shufps {{.*#+}} xmm4 = xmm4[0,3],xmm6[2,3]		; SSE2-NEXT: shufps {{.*#+}} xmm7 = xmm7[0,1],xmm9[0,3]
; SSE2-NEXT: movaps %xmm2, %xmm7		; SSE2-NEXT: shufps {{.*#+}} xmm9 = xmm9[1,0],xmm3[2,0]
; SSE2-NEXT: shufps {{.*#+}} xmm7 = xmm7[1,0],xmm4[2,0]		; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,1],xmm9[2,0]
; SSE2-NEXT: shufps {{.*#+}} xmm4 = xmm4[0,1],xmm7[2,0]		; SSE2-NEXT: movaps %xmm0, %xmm5
; SSE2-NEXT: pshufd {{.*#+}} xmm10 = xmm0[2,3,0,1]		; SSE2-NEXT: shufps {{.*#+}} xmm5 = xmm5[0,3],xmm11[2,3]
; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,0],xmm6[0,0]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm6[3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm4 = xmm11[1,1,2,3]
; SSE2-NEXT: pshufd {{.*#+}} xmm9 = xmm2[0,1,0,3]		; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1]
; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[2,0],xmm0[2,0]		; SSE2-NEXT: movaps %xmm8, %xmm4
; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,0]		; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1],xmm8[0,3]
; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]		; SSE2-NEXT: shufps {{.*#+}} xmm8 = xmm8[1,0],xmm5[2,0]
; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,0],xmm8[0,0]		; SSE2-NEXT: shufps {{.*#+}} xmm5 = xmm5[0,1],xmm8[2,0]
; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm8[3,3]		; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,0],xmm11[0,0]
; SSE2-NEXT: pshufd {{.*#+}} xmm7 = xmm5[0,1,0,3]		; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm11[3,3]
; SSE2-NEXT: shufps {{.*#+}} xmm5 = xmm5[2,0],xmm1[2,0]		; SSE2-NEXT: shufps {{.*#+}} xmm4 = xmm4[2,0],xmm0[2,0]
; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1],xmm5[2,0]		; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm4[2,0]
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm6[1,1,2,3]		; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,0],xmm10[0,0]
; SSE2-NEXT: punpckldq {{.*#+}} xmm10 = xmm10[0],xmm5[0],xmm10[1],xmm5[1]		; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm10[3,3]
; SSE2-NEXT: movsd {{.*#+}} xmm9 = xmm10[0],xmm9[1]		; SSE2-NEXT: shufps {{.*#+}} xmm6 = xmm6[2,0],xmm2[2,0]
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm8[1,1,2,3]		; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,1],xmm6[2,0]
; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm5[0],xmm2[1],xmm5[1]
; SSE2-NEXT: movsd {{.*#+}} xmm7 = xmm2[0],xmm7[1]
; SSE2-NEXT: movups %xmm3, 16(%rsi)		; SSE2-NEXT: movups %xmm3, 16(%rsi)
; SSE2-NEXT: movups %xmm4, (%rsi)		; SSE2-NEXT: movups %xmm5, (%rsi)
; SSE2-NEXT: movups %xmm1, 16(%rdx)		; SSE2-NEXT: movups %xmm2, 16(%rdx)
; SSE2-NEXT: movups %xmm0, (%rdx)		; SSE2-NEXT: movups %xmm0, (%rdx)
; SSE2-NEXT: movupd %xmm7, 16(%rcx)		; SSE2-NEXT: movups %xmm7, 16(%rcx)
; SSE2-NEXT: movupd %xmm9, (%rcx)		; SSE2-NEXT: movups %xmm1, (%rcx)
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE42-LABEL: interleave_24i32_out:		; SSE42-LABEL: interleave_24i32_out:
; SSE42: # %bb.0:		; SSE42: # %bb.0:
; SSE42-NEXT: movdqu 80(%rdi), %xmm9		; SSE42-NEXT: movdqu 80(%rdi), %xmm9
; SSE42-NEXT: movdqu 64(%rdi), %xmm10		; SSE42-NEXT: movdqu 64(%rdi), %xmm10
; SSE42-NEXT: movdqu (%rdi), %xmm4		; SSE42-NEXT: movdqu (%rdi), %xmm4
; SSE42-NEXT: movdqu 16(%rdi), %xmm2		; SSE42-NEXT: movdqu 16(%rdi), %xmm2
▲ Show 20 Lines • Show All 456 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shift-ashr-128.ll

	Show First 20 Lines • Show All 1,227 Lines • ▼ Show 20 Lines
	}			}

	define <8 x i16> @constant_shift_v8i16(<8 x i16> %a) nounwind {			define <8 x i16> @constant_shift_v8i16(<8 x i16> %a) nounwind {
	; SSE2-LABEL: constant_shift_v8i16:			; SSE2-LABEL: constant_shift_v8i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm0, %xmm1			; SSE2-NEXT: movdqa %xmm0, %xmm1
	; SSE2-NEXT: psraw $4, %xmm1			; SSE2-NEXT: psraw $4, %xmm1
	; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]			; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[0,2,2,3]			; SSE2-NEXT: movapd %xmm1, %xmm2
				; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[2,3]
	; SSE2-NEXT: psraw $2, %xmm1			; SSE2-NEXT: psraw $2, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]
	; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]			; SSE2-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
	; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [65535,0,65535,0,65535,0,65535,0]			; SSE2-NEXT: movaps {{.*#+}} xmm1 = [65535,0,65535,0,65535,0,65535,0]
	; SSE2-NEXT: movdqa %xmm2, %xmm1			; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: pand %xmm0, %xmm1			; SSE2-NEXT: andps %xmm1, %xmm0
	; SSE2-NEXT: psraw $1, %xmm2			; SSE2-NEXT: psraw $1, %xmm2
	; SSE2-NEXT: pandn %xmm2, %xmm0			; SSE2-NEXT: andnps %xmm2, %xmm1
	; SSE2-NEXT: por %xmm1, %xmm0			; SSE2-NEXT: orps %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: constant_shift_v8i16:			; SSE41-LABEL: constant_shift_v8i16:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movdqa %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm0, %xmm1
	; SSE41-NEXT: psraw $4, %xmm1			; SSE41-NEXT: psraw $4, %xmm1
	; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1,2,3],xmm1[4,5,6,7]			; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1,2,3],xmm1[4,5,6,7]
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; AVX512BWVL-NEXT: vpsravw {{.*}}(%rip), %xmm0, %xmm0			; AVX512BWVL-NEXT: vpsravw {{.*}}(%rip), %xmm0, %xmm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X32-SSE-LABEL: constant_shift_v8i16:			; X32-SSE-LABEL: constant_shift_v8i16:
	; X32-SSE: # %bb.0:			; X32-SSE: # %bb.0:
	; X32-SSE-NEXT: movdqa %xmm0, %xmm1			; X32-SSE-NEXT: movdqa %xmm0, %xmm1
	; X32-SSE-NEXT: psraw $4, %xmm1			; X32-SSE-NEXT: psraw $4, %xmm1
	; X32-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]			; X32-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
	; X32-SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm1[0,2,2,3]			; X32-SSE-NEXT: movapd %xmm1, %xmm2
				; X32-SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[2,3]
	; X32-SSE-NEXT: psraw $2, %xmm1			; X32-SSE-NEXT: psraw $2, %xmm1
	; X32-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]			; X32-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]
	; X32-SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]			; X32-SSE-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
	; X32-SSE-NEXT: movdqa {{.*#+}} xmm0 = [65535,0,65535,0,65535,0,65535,0]			; X32-SSE-NEXT: movaps {{.*#+}} xmm1 = [65535,0,65535,0,65535,0,65535,0]
	; X32-SSE-NEXT: movdqa %xmm2, %xmm1			; X32-SSE-NEXT: movaps %xmm2, %xmm0
	; X32-SSE-NEXT: pand %xmm0, %xmm1			; X32-SSE-NEXT: andps %xmm1, %xmm0
	; X32-SSE-NEXT: psraw $1, %xmm2			; X32-SSE-NEXT: psraw $1, %xmm2
	; X32-SSE-NEXT: pandn %xmm2, %xmm0			; X32-SSE-NEXT: andnps %xmm2, %xmm1
	; X32-SSE-NEXT: por %xmm1, %xmm0			; X32-SSE-NEXT: orps %xmm1, %xmm0
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%shift = ashr <8 x i16> %a, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>			%shift = ashr <8 x i16> %a, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>
	ret <8 x i16> %shift			ret <8 x i16> %shift
	}			}

	define <16 x i8> @constant_shift_v16i8(<16 x i8> %a) nounwind {			define <16 x i8> @constant_shift_v16i8(<16 x i8> %a) nounwind {
	; SSE2-LABEL: constant_shift_v16i8:			; SSE2-LABEL: constant_shift_v16i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll

	Show First 20 Lines • Show All 987 Lines • ▼ Show 20 Lines
	}			}

	define <8 x i16> @constant_shift_v8i16(<8 x i16> %a) nounwind {			define <8 x i16> @constant_shift_v8i16(<8 x i16> %a) nounwind {
	; SSE2-LABEL: constant_shift_v8i16:			; SSE2-LABEL: constant_shift_v8i16:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa %xmm0, %xmm1			; SSE2-NEXT: movdqa %xmm0, %xmm1
	; SSE2-NEXT: psrlw $4, %xmm1			; SSE2-NEXT: psrlw $4, %xmm1
	; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]			; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[0,2,2,3]			; SSE2-NEXT: movapd %xmm1, %xmm2
				; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[2,3]
	; SSE2-NEXT: psrlw $2, %xmm1			; SSE2-NEXT: psrlw $2, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]
	; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]			; SSE2-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
	; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [65535,0,65535,0,65535,0,65535,0]			; SSE2-NEXT: movaps {{.*#+}} xmm1 = [65535,0,65535,0,65535,0,65535,0]
	; SSE2-NEXT: movdqa %xmm2, %xmm1			; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: pand %xmm0, %xmm1			; SSE2-NEXT: andps %xmm1, %xmm0
	; SSE2-NEXT: psrlw $1, %xmm2			; SSE2-NEXT: psrlw $1, %xmm2
	; SSE2-NEXT: pandn %xmm2, %xmm0			; SSE2-NEXT: andnps %xmm2, %xmm1
	; SSE2-NEXT: por %xmm1, %xmm0			; SSE2-NEXT: orps %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: constant_shift_v8i16:			; SSE41-LABEL: constant_shift_v8i16:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movdqa %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm0, %xmm1
	; SSE41-NEXT: psrlw $4, %xmm1			; SSE41-NEXT: psrlw $4, %xmm1
	; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1,2,3],xmm1[4,5,6,7]			; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0,1,2,3],xmm1[4,5,6,7]
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; AVX512BWVL-NEXT: vpsrlvw {{.*}}(%rip), %xmm0, %xmm0			; AVX512BWVL-NEXT: vpsrlvw {{.*}}(%rip), %xmm0, %xmm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	;			;
	; X32-SSE-LABEL: constant_shift_v8i16:			; X32-SSE-LABEL: constant_shift_v8i16:
	; X32-SSE: # %bb.0:			; X32-SSE: # %bb.0:
	; X32-SSE-NEXT: movdqa %xmm0, %xmm1			; X32-SSE-NEXT: movdqa %xmm0, %xmm1
	; X32-SSE-NEXT: psrlw $4, %xmm1			; X32-SSE-NEXT: psrlw $4, %xmm1
	; X32-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]			; X32-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
	; X32-SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm1[0,2,2,3]			; X32-SSE-NEXT: movapd %xmm1, %xmm2
				; X32-SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[2,3]
	; X32-SSE-NEXT: psrlw $2, %xmm1			; X32-SSE-NEXT: psrlw $2, %xmm1
	; X32-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]			; X32-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]
	; X32-SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]			; X32-SSE-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
	; X32-SSE-NEXT: movdqa {{.*#+}} xmm0 = [65535,0,65535,0,65535,0,65535,0]			; X32-SSE-NEXT: movaps {{.*#+}} xmm1 = [65535,0,65535,0,65535,0,65535,0]
	; X32-SSE-NEXT: movdqa %xmm2, %xmm1			; X32-SSE-NEXT: movaps %xmm2, %xmm0
	; X32-SSE-NEXT: pand %xmm0, %xmm1			; X32-SSE-NEXT: andps %xmm1, %xmm0
	; X32-SSE-NEXT: psrlw $1, %xmm2			; X32-SSE-NEXT: psrlw $1, %xmm2
	; X32-SSE-NEXT: pandn %xmm2, %xmm0			; X32-SSE-NEXT: andnps %xmm2, %xmm1
	; X32-SSE-NEXT: por %xmm1, %xmm0			; X32-SSE-NEXT: orps %xmm1, %xmm0
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%shift = lshr <8 x i16> %a, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>			%shift = lshr <8 x i16> %a, <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>
	ret <8 x i16> %shift			ret <8 x i16> %shift
	}			}

	define <16 x i8> @constant_shift_v16i8(<16 x i8> %a) nounwind {			define <16 x i8> @constant_shift_v16i8(<16 x i8> %a) nounwind {
	; SSE2-LABEL: constant_shift_v16i8:			; SSE2-LABEL: constant_shift_v16i8:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll

Show First 20 Lines • Show All 1,242 Lines • ▼ Show 20 Lines	; AVX512VL-FAST-NEXT: retq
%shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 4, i32 4, i32 3, i32 10, i32 undef, i32 undef, i32 undef, i32 undef>		%shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 4, i32 4, i32 3, i32 10, i32 undef, i32 undef, i32 undef, i32 undef>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_032dXXXX(<8 x i16> %a, <8 x i16> %b) {		define <8 x i16> @shuffle_v8i16_032dXXXX(<8 x i16> %a, <8 x i16> %b) {
; SSE2-LABEL: shuffle_v8i16_032dXXXX:		; SSE2-LABEL: shuffle_v8i16_032dXXXX:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]		; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[3,1,2,0]		; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1,2,0]
; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,5,6,7]		; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm1[0,1,2,3,6,5,6,7]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v8i16_032dXXXX:		; SSSE3-LABEL: shuffle_v8i16_032dXXXX:
; SSSE3: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = zero,zero,zero,zero,zero,zero,xmm1[10,11,u,u,u,u,u,u,u,u]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = zero,zero,zero,zero,zero,zero,xmm1[10,11,u,u,u,u,u,u,u,u]
; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0,1,6,7,4,5],zero,zero,xmm0[u,u,u,u,u,u,u,u]		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0,1,6,7,4,5],zero,zero,xmm0[u,u,u,u,u,u,u,u]
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 12, i32 13, i32 14, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		%shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 12, i32 13, i32 14, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_012dcde3(<8 x i16> %a, <8 x i16> %b) {		define <8 x i16> @shuffle_v8i16_012dcde3(<8 x i16> %a, <8 x i16> %b) {
; SSE2-LABEL: shuffle_v8i16_012dcde3:		; SSE2-LABEL: shuffle_v8i16_012dcde3:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]		; SSE2-NEXT: movsd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,3,2,1]		; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,3,2,1]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[3,1,2,0,4,5,6,7]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm1[3,1,2,0,4,5,6,7]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,0]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,3,2,1,4,5,6,7]
; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,5,7]		; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,5,7]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,2,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,2,1]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,3,0,2,4,5,6,7]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,3,0,2,4,5,6,7]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v8i16_012dcde3:		; SSSE3-LABEL: shuffle_v8i16_012dcde3:
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
; AVX512VL-FAST-NEXT: retq		; AVX512VL-FAST-NEXT: retq
%shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 5, i32 7, i32 9>		%shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 5, i32 7, i32 9>
ret <8 x i16> %shuffle		ret <8 x i16> %shuffle
}		}

define <8 x i16> @shuffle_v8i16_XX4X8acX(<8 x i16> %a, <8 x i16> %b) {		define <8 x i16> @shuffle_v8i16_XX4X8acX(<8 x i16> %a, <8 x i16> %b) {
; SSE2-LABEL: shuffle_v8i16_XX4X8acX:		; SSE2-LABEL: shuffle_v8i16_XX4X8acX:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,2,3,3]		; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,2,2,3,4,5,6,7]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm1[0,2,2,3,4,5,6,7]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,2,0]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,2,0]		; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,6,7,4,7]
; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,6,7,4,7]		; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[2,2],xmm1[2,3]
; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm2[0],xmm0[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_v8i16_XX4X8acX:		; SSSE3-LABEL: shuffle_v8i16_XX4X8acX:
; SSSE3: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[u,u,u,u,8,9,u,u],zero,zero,zero,zero,zero,zero,xmm0[u,u]		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[u,u,u,u,8,9,u,u],zero,zero,zero,zero,zero,zero,xmm0[u,u]
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,u],zero,zero,xmm1[u,u,0,1,4,5,8,9,u,u]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,u],zero,zero,xmm1[u,u,0,1,4,5,8,9,u,u]
; SSSE3-NEXT: por %xmm1, %xmm0		; SSSE3-NEXT: por %xmm1, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
▲ Show 20 Lines • Show All 1,170 Lines • Show Last 20 Lines