This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)
ClosedPublic

Authored by spatel on Oct 12 2018, 9:20 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
lebedev.ri
efriedma
javed.absar

Commits

rGe439cc274532: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)
rL344872: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)

Summary

This is a late backend subset of the IR transform added with:
D52439

We can confirm that the conversion to a 'trunc' is correct by running:
$ opt -instcombine -data-layout="e"
(assuming the IR transforms are correct; change "e" to "E" for big-endian)

As discussed in PR39016:
https://bugs.llvm.org/show_bug.cgi?id=39016
...the pattern may emerge during legalization, so that's why I've opted to wait for an insertelement to become a scalar_to_vector in the pattern matching here.

The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments).

The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage.

The bug report mentions an x86 regression test that isn't changed here, so I'm also not sure if this is enough to close the bug report.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Oct 12 2018, 9:20 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 12 2018, 9:20 AM

Herald added a subscriber: mcrosier. · View Herald Transcript

craig.topper added inline comments.Oct 12 2018, 12:16 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15526 ↗	(On Diff #169418)	Does this work correctly on big endian for non power of 2 element count or element size such that XBitWidth isn't evenly divisible by VecEltBitWidth.

spatel added inline comments.Oct 13 2018, 8:02 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15526 ↗	(On Diff #169418)	I'll add a test with non-power-of-2 vector size, but I don't see a way to actually exercise that possibility given that this transform is only happening late (legal types are already in effect, but we can make that explicit). AFAICT, the scalar bitwidth must be a multiple of the vector element bitwidth, so I'll add an assert for that.

Patch updated:

Add "LegalTypes" as the first predicate for trying this transform to make it less likely that any weird types invalidate the later assumptions.
Add an assert that the scalar bitwidth is a multiple of the vector element bitwidth (scalar_to_vector size must be a multiple, and bitcast can't change size).
Add a test that at least starts with a weird type in IR.

FYI, the post legalize types DAG combine doesn’t run if all the types in the DAG were legal and nothing was changed by legalize types. So gating a combine on LegalTypes can really mean a transform doesn’t run until after vector op legalization(again the DAG combine after this doesn’t run if nothing changed) or LegalizeDAG.

In D53201#1264513, @craig.topper wrote:

FYI, the post legalize types DAG combine doesn’t run if all the types in the DAG were legal and nothing was changed by legalize types. So gating a combine on LegalTypes can really mean a transform doesn’t run until after vector op legalization(again the DAG combine after this doesn’t run if nothing changed) or LegalizeDAG.

OK, didn't realize that limitation, although it still catches all of the cases that I've added here. Suggestions for a more appropriate predicate?
We could confirm that the type that we're casting to is legal:
TLI.isTypeLegal(NVT) ?

LGTM

This revision is now accepted and ready to land.Oct 15 2018, 10:05 PM

Closed by commit rL344872: [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016) (authored by spatel). · Explain WhyOct 21 2018, 1:16 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

33 lines

test/

CodeGen/

AArch64/

extract-insert.ll

38 lines

X86/

extract-insert.ll

9 lines

mmx-coalescing.ll

9 lines

Diff 170350

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,497 Lines • ▼ Show 20 Lines	if (ConstEltNo &&
// Sometimes build_vector's scalar input types do not match result type.		// Sometimes build_vector's scalar input types do not match result type.
if (NVT == InEltVT)		if (NVT == InEltVT)
return Elt;		return Elt;

// TODO: It may be useful to truncate if free if the build_vector implicitly		// TODO: It may be useful to truncate if free if the build_vector implicitly
// converts.		// converts.
}		}

if (ConstEltNo && InVec.getOpcode() == ISD::BITCAST) {		// TODO: These transforms should not require the 'hasOneUse' restriction, but
		// there are regressions on multiple targets without it. We can end up with a
		// mess of scalar and vector code if we reduce only part of the DAG to scalar.
		if (ConstEltNo && InVec.getOpcode() == ISD::BITCAST && VT.isInteger() &&
		InVec.hasOneUse()) {
// The vector index of the LSBs of the source depend on the endian-ness.		// The vector index of the LSBs of the source depend on the endian-ness.
bool IsLE = DAG.getDataLayout().isLittleEndian();		bool IsLE = DAG.getDataLayout().isLittleEndian();
		unsigned ExtractIndex = ConstEltNo->getZExtValue();
// extract_elt (v2i32 (bitcast i64:x)), BCTruncElt -> i32 (trunc i64:x)		// extract_elt (v2i32 (bitcast i64:x)), BCTruncElt -> i32 (trunc i64:x)
unsigned BCTruncElt = IsLE ? 0 : VT.getVectorNumElements() - 1;		unsigned BCTruncElt = IsLE ? 0 : VT.getVectorNumElements() - 1;
SDValue BCSrc = InVec.getOperand(0);		SDValue BCSrc = InVec.getOperand(0);
if (InVec.hasOneUse() && ConstEltNo->getZExtValue() == BCTruncElt &&		if (ExtractIndex == BCTruncElt && BCSrc.getValueType().isScalarInteger())
VT.isInteger() && BCSrc.getValueType().isScalarInteger())
return DAG.getNode(ISD::TRUNCATE, SDLoc(N), NVT, BCSrc);		return DAG.getNode(ISD::TRUNCATE, SDLoc(N), NVT, BCSrc);

		if (LegalTypes && BCSrc.getValueType().isInteger() &&
		BCSrc.getOpcode() == ISD::SCALAR_TO_VECTOR) {
		// ext_elt (bitcast (scalar_to_vec i64 X to v2i64) to v4i32), TruncElt -->
		// trunc i64 X to i32
		SDValue X = BCSrc.getOperand(0);
		assert(X.getValueType().isScalarInteger() && NVT.isScalarInteger() &&
		"Extract element and scalar to vector can't change element type "
		"from FP to integer.");
		unsigned XBitWidth = X.getValueSizeInBits();
		unsigned VecEltBitWidth = VT.getScalarSizeInBits();
		BCTruncElt = IsLE ? 0 : XBitWidth / VecEltBitWidth - 1;

		// An extract element return value type can be wider than its vector
		// operand element type. In that case, the high bits are undefined, so
		// it's possible that we may need to extend rather than truncate.
		if (ExtractIndex == BCTruncElt && XBitWidth > VecEltBitWidth) {
		assert(XBitWidth % VecEltBitWidth == 0 &&
		"Scalar bitwidth must be a multiple of vector element bitwidth");
		return DAG.getAnyExtOrTrunc(X, SDLoc(N), NVT);
		}
		}
}		}

// extract_vector_elt (insert_vector_elt vec, val, idx), idx) -> val		// extract_vector_elt (insert_vector_elt vec, val, idx), idx) -> val
//		//
// This only really matters if the index is non-constant since other combines		// This only really matters if the index is non-constant since other combines
// on the constant elements already work.		// on the constant elements already work.
if (InVec.getOpcode() == ISD::INSERT_VECTOR_ELT &&		if (InVec.getOpcode() == ISD::INSERT_VECTOR_ELT &&
EltNo == InVec.getOperand(2)) {		EltNo == InVec.getOperand(2)) {
▲ Show 20 Lines • Show All 3,502 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/extract-insert.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64_be-- < %s \| FileCheck %s --check-prefix=BE			; RUN: llc -mtriple=aarch64_be-- < %s \| FileCheck %s --check-prefix=BE
	; RUN: llc -mtriple=aarch64-- < %s \| FileCheck %s --check-prefix=LE			; RUN: llc -mtriple=aarch64-- < %s \| FileCheck %s --check-prefix=LE

	define i32 @trunc_i64_to_i32_le(i64 %x) {			define i32 @trunc_i64_to_i32_le(i64 %x) {
	; BE-LABEL: trunc_i64_to_i32_le:			; BE-LABEL: trunc_i64_to_i32_le:
	; BE: // %bb.0:			; BE: // %bb.0:
	; BE-NEXT: fmov d0, x0			; BE-NEXT: fmov d0, x0
	; BE-NEXT: rev64 v0.4s, v0.4s			; BE-NEXT: rev64 v0.4s, v0.4s
	; BE-NEXT: fmov w0, s0			; BE-NEXT: fmov w0, s0
	; BE-NEXT: ret			; BE-NEXT: ret
	;			;
	; LE-LABEL: trunc_i64_to_i32_le:			; LE-LABEL: trunc_i64_to_i32_le:
	; LE: // %bb.0:			; LE: // %bb.0:
	; LE-NEXT: fmov d0, x0			; LE-NEXT: // kill: def $w0 killed $w0 killed $x0
	; LE-NEXT: fmov w0, s0
	; LE-NEXT: ret			; LE-NEXT: ret
	%ins = insertelement <2 x i64> undef, i64 %x, i32 0			%ins = insertelement <2 x i64> undef, i64 %x, i32 0
	%bc = bitcast <2 x i64> %ins to <4 x i32>			%bc = bitcast <2 x i64> %ins to <4 x i32>
	%ext = extractelement <4 x i32> %bc, i32 0			%ext = extractelement <4 x i32> %bc, i32 0
	ret i32 %ext			ret i32 %ext
	}			}

	define i32 @trunc_i64_to_i32_be(i64 %x) {			define i32 @trunc_i64_to_i32_be(i64 %x) {
	; BE-LABEL: trunc_i64_to_i32_be:			; BE-LABEL: trunc_i64_to_i32_be:
	; BE: // %bb.0:			; BE: // %bb.0:
	; BE-NEXT: fmov d0, x0			; BE-NEXT: // kill: def $w0 killed $w0 killed $x0
	; BE-NEXT: rev64 v0.4s, v0.4s
	; BE-NEXT: mov w0, v0.s[1]
	; BE-NEXT: ret			; BE-NEXT: ret
	;			;
	; LE-LABEL: trunc_i64_to_i32_be:			; LE-LABEL: trunc_i64_to_i32_be:
	; LE: // %bb.0:			; LE: // %bb.0:
	; LE-NEXT: fmov d0, x0			; LE-NEXT: fmov d0, x0
	; LE-NEXT: mov w0, v0.s[1]			; LE-NEXT: mov w0, v0.s[1]
	; LE-NEXT: ret			; LE-NEXT: ret
	%ins = insertelement <2 x i64> undef, i64 %x, i32 0			%ins = insertelement <2 x i64> undef, i64 %x, i32 0
	%bc = bitcast <2 x i64> %ins to <4 x i32>			%bc = bitcast <2 x i64> %ins to <4 x i32>
	%ext = extractelement <4 x i32> %bc, i32 1			%ext = extractelement <4 x i32> %bc, i32 1
	ret i32 %ext			ret i32 %ext
	}			}

	define i16 @trunc_i64_to_i16_le(i64 %x) {			define i16 @trunc_i64_to_i16_le(i64 %x) {
	; BE-LABEL: trunc_i64_to_i16_le:			; BE-LABEL: trunc_i64_to_i16_le:
	; BE: // %bb.0:			; BE: // %bb.0:
	; BE-NEXT: fmov d0, x0			; BE-NEXT: fmov d0, x0
	; BE-NEXT: rev64 v0.8h, v0.8h			; BE-NEXT: rev64 v0.8h, v0.8h
	; BE-NEXT: umov w0, v0.h[0]			; BE-NEXT: umov w0, v0.h[0]
	; BE-NEXT: ret			; BE-NEXT: ret
	;			;
	; LE-LABEL: trunc_i64_to_i16_le:			; LE-LABEL: trunc_i64_to_i16_le:
	; LE: // %bb.0:			; LE: // %bb.0:
	; LE-NEXT: fmov d0, x0			; LE-NEXT: // kill: def $w0 killed $w0 killed $x0
	; LE-NEXT: umov w0, v0.h[0]
	; LE-NEXT: ret			; LE-NEXT: ret
	%ins = insertelement <2 x i64> undef, i64 %x, i32 0			%ins = insertelement <2 x i64> undef, i64 %x, i32 0
	%bc = bitcast <2 x i64> %ins to <8 x i16>			%bc = bitcast <2 x i64> %ins to <8 x i16>
	%ext = extractelement <8 x i16> %bc, i32 0			%ext = extractelement <8 x i16> %bc, i32 0
	ret i16 %ext			ret i16 %ext
	}			}

	define i16 @trunc_i64_to_i16_be(i64 %x) {			define i16 @trunc_i64_to_i16_be(i64 %x) {
	; BE-LABEL: trunc_i64_to_i16_be:			; BE-LABEL: trunc_i64_to_i16_be:
	; BE: // %bb.0:			; BE: // %bb.0:
	; BE-NEXT: fmov d0, x0			; BE-NEXT: // kill: def $w0 killed $w0 killed $x0
	; BE-NEXT: rev64 v0.8h, v0.8h
	; BE-NEXT: umov w0, v0.h[3]
	; BE-NEXT: ret			; BE-NEXT: ret
	;			;
	; LE-LABEL: trunc_i64_to_i16_be:			; LE-LABEL: trunc_i64_to_i16_be:
	; LE: // %bb.0:			; LE: // %bb.0:
	; LE-NEXT: fmov d0, x0			; LE-NEXT: fmov d0, x0
	; LE-NEXT: umov w0, v0.h[3]			; LE-NEXT: umov w0, v0.h[3]
	; LE-NEXT: ret			; LE-NEXT: ret
	%ins = insertelement <2 x i64> undef, i64 %x, i32 0			%ins = insertelement <2 x i64> undef, i64 %x, i32 0
	%bc = bitcast <2 x i64> %ins to <8 x i16>			%bc = bitcast <2 x i64> %ins to <8 x i16>
	%ext = extractelement <8 x i16> %bc, i32 3			%ext = extractelement <8 x i16> %bc, i32 3
	ret i16 %ext			ret i16 %ext
	}			}

	define i8 @trunc_i32_to_i8_le(i32 %x) {			define i8 @trunc_i32_to_i8_le(i32 %x) {
	; BE-LABEL: trunc_i32_to_i8_le:			; BE-LABEL: trunc_i32_to_i8_le:
	; BE: // %bb.0:			; BE: // %bb.0:
	; BE-NEXT: fmov s0, w0			; BE-NEXT: fmov s0, w0
	; BE-NEXT: rev32 v0.16b, v0.16b			; BE-NEXT: rev32 v0.16b, v0.16b
	; BE-NEXT: umov w0, v0.b[0]			; BE-NEXT: umov w0, v0.b[0]
	; BE-NEXT: ret			; BE-NEXT: ret
	;			;
	; LE-LABEL: trunc_i32_to_i8_le:			; LE-LABEL: trunc_i32_to_i8_le:
	; LE: // %bb.0:			; LE: // %bb.0:
	; LE-NEXT: fmov s0, w0
	; LE-NEXT: umov w0, v0.b[0]
	; LE-NEXT: ret			; LE-NEXT: ret
	%ins = insertelement <4 x i32> undef, i32 %x, i32 0			%ins = insertelement <4 x i32> undef, i32 %x, i32 0
	%bc = bitcast <4 x i32> %ins to <16 x i8>			%bc = bitcast <4 x i32> %ins to <16 x i8>
	%ext = extractelement <16 x i8> %bc, i32 0			%ext = extractelement <16 x i8> %bc, i32 0
	ret i8 %ext			ret i8 %ext
	}			}

	define i8 @trunc_i32_to_i8_be(i32 %x) {			define i8 @trunc_i32_to_i8_be(i32 %x) {
	; BE-LABEL: trunc_i32_to_i8_be:			; BE-LABEL: trunc_i32_to_i8_be:
	; BE: // %bb.0:			; BE: // %bb.0:
	; BE-NEXT: fmov s0, w0
	; BE-NEXT: rev32 v0.16b, v0.16b
	; BE-NEXT: umov w0, v0.b[3]
	; BE-NEXT: ret			; BE-NEXT: ret
	;			;
	; LE-LABEL: trunc_i32_to_i8_be:			; LE-LABEL: trunc_i32_to_i8_be:
	; LE: // %bb.0:			; LE: // %bb.0:
	; LE-NEXT: fmov s0, w0			; LE-NEXT: fmov s0, w0
	; LE-NEXT: umov w0, v0.b[3]			; LE-NEXT: umov w0, v0.b[3]
	; LE-NEXT: ret			; LE-NEXT: ret
	%ins = insertelement <4 x i32> undef, i32 %x, i32 0			%ins = insertelement <4 x i32> undef, i32 %x, i32 0
	%bc = bitcast <4 x i32> %ins to <16 x i8>			%bc = bitcast <4 x i32> %ins to <16 x i8>
	%ext = extractelement <16 x i8> %bc, i32 3			%ext = extractelement <16 x i8> %bc, i32 3
	ret i8 %ext			ret i8 %ext
	}			}

				; Weird type (non-power-of-2 vector) is ok.

				define i8 @trunc_i64_to_i8_be(i64 %x) {
				; BE-LABEL: trunc_i64_to_i8_be:
				; BE: // %bb.0:
				; BE-NEXT: // kill: def $w0 killed $w0 killed $x0
				; BE-NEXT: ret
				;
				; LE-LABEL: trunc_i64_to_i8_be:
				; LE: // %bb.0:
				; LE-NEXT: fmov d0, x0
				; LE-NEXT: umov w0, v0.b[7]
				; LE-NEXT: ret
				%ins = insertelement <3 x i64> undef, i64 %x, i32 0
				%bc = bitcast <3 x i64> %ins to <24 x i8>
				%ext = extractelement <24 x i8> %bc, i32 7
				ret i8 %ext
				}

llvm/trunk/test/CodeGen/X86/extract-insert.ll

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	define i32 @trunc_i64_to_i32_le(i64 %x) {			define i32 @trunc_i64_to_i32_le(i64 %x) {
	; X86-LABEL: trunc_i64_to_i32_le:			; X86-LABEL: trunc_i64_to_i32_le:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: trunc_i64_to_i32_le:			; X64-LABEL: trunc_i64_to_i32_le:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %xmm0			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movd %xmm0, %eax			; X64-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%ins = insertelement <2 x i64> undef, i64 %x, i32 0			%ins = insertelement <2 x i64> undef, i64 %x, i32 0
	%bc = bitcast <2 x i64> %ins to <4 x i32>			%bc = bitcast <2 x i64> %ins to <4 x i32>
	%ext = extractelement <4 x i32> %bc, i32 0			%ext = extractelement <4 x i32> %bc, i32 0
	ret i32 %ext			ret i32 %ext
	}			}

	define i16 @trunc_i64_to_i16_le(i64 %x) {			define i16 @trunc_i64_to_i16_le(i64 %x) {
	; X86-LABEL: trunc_i64_to_i16_le:			; X86-LABEL: trunc_i64_to_i16_le:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: # kill: def $ax killed $ax killed $eax			; X86-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: trunc_i64_to_i16_le:			; X64-LABEL: trunc_i64_to_i16_le:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %xmm0			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movd %xmm0, %eax			; X64-NEXT: # kill: def $ax killed $ax killed $rax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%ins = insertelement <2 x i64> undef, i64 %x, i32 0			%ins = insertelement <2 x i64> undef, i64 %x, i32 0
	%bc = bitcast <2 x i64> %ins to <8 x i16>			%bc = bitcast <2 x i64> %ins to <8 x i16>
	%ext = extractelement <8 x i16> %bc, i32 0			%ext = extractelement <8 x i16> %bc, i32 0
	ret i16 %ext			ret i16 %ext
	}			}

	define i8 @trunc_i32_to_i8_le(i32 %x) {			define i8 @trunc_i32_to_i8_le(i32 %x) {
	Show All 16 Lines

llvm/trunk/test/CodeGen/X86/mmx-coalescing.ll

	Show All 10 Lines
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: pshufw $238, (%rdi), %mm0 # mm0 = mem[2,3,2,3]			; CHECK-NEXT: pshufw $238, (%rdi), %mm0 # mm0 = mem[2,3,2,3]
	; CHECK-NEXT: movd %mm0, %eax			; CHECK-NEXT: movd %mm0, %eax
	; CHECK-NEXT: testl %eax, %eax			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: je .LBB0_1			; CHECK-NEXT: je .LBB0_1
	; CHECK-NEXT: # %bb.2: # %if.B			; CHECK-NEXT: # %bb.2: # %if.B
	; CHECK-NEXT: pshufw $238, %mm0, %mm0 # mm0 = mm0[2,3,2,3]			; CHECK-NEXT: pshufw $238, %mm0, %mm0 # mm0 = mm0[2,3,2,3]
	; CHECK-NEXT: movq %mm0, %rax			; CHECK-NEXT: movq %mm0, %rax
	; CHECK-NEXT: jmp .LBB0_3			; CHECK-NEXT: testl %eax, %eax
				; CHECK-NEXT: jne .LBB0_4
	; CHECK-NEXT: .LBB0_1: # %if.A			; CHECK-NEXT: .LBB0_1: # %if.A
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: movd %edx, %mm1			; CHECK-NEXT: movd %edx, %mm1
	; CHECK-NEXT: psllq %mm1, %mm0			; CHECK-NEXT: psllq %mm1, %mm0
	; CHECK-NEXT: movq %mm0, %rax			; CHECK-NEXT: movq %mm0, %rax
	; CHECK-NEXT: testq %rax, %rax			; CHECK-NEXT: testq %rax, %rax
	; CHECK-NEXT: jne .LBB0_4			; CHECK-NEXT: jne .LBB0_4
	; CHECK-NEXT: .LBB0_3: # %if.C			; CHECK-NEXT: # %bb.3: # %if.C
	; CHECK-NEXT: movq %rax, %xmm0			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movd %xmm0, %eax
	; CHECK-NEXT: testl %eax, %eax			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: je .LBB0_1			; CHECK-NEXT: je .LBB0_1
	; CHECK-NEXT: .LBB0_4: # %merge			; CHECK-NEXT: .LBB0_4: # %merge
	; CHECK-NEXT: pshufw $238, %mm0, %mm0 # mm0 = mm0[2,3,2,3]			; CHECK-NEXT: pshufw $238, %mm0, %mm0 # mm0 = mm0[2,3,2,3]
	; CHECK-NEXT: movd %mm0, %eax			; CHECK-NEXT: movd %mm0, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%shl = shl i32 1, %B			%shl = shl i32 1, %B
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines