This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
ClosedPublic

Authored by RKSimon on Jan 29 2016, 10:33 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
delena
mkuper

Commits

rG6788f33cf2d8: [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to…
rL259796: [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to…

Summary

This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVSS load.

Follow up to D16217

Note: I've been looking into correcting the domain for both the MOVSS/MOVD and the MOVSD/MOVQ load/stores but am concerned about the number of test changes - is this something that people think is worthwhile? I'd probably have to change many of the tests to ensure that they keep to the intended domain,

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 46397.Jan 29 2016, 10:33 AM

RKSimon retitled this revision from to [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads.

RKSimon updated this object.

RKSimon added reviewers: qcolombet, spatel, mkuper.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

RKSimon mentioned this in D16768: [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256-bit vectors to EltsFromConsecutiveLoads.Feb 1 2016, 6:28 AM

Add 512-bit vector support

delena added inline comments.Feb 2 2016, 12:10 PM

test/CodeGen/X86/merge-consecutive-loads-128.ll
335 ↗	(On Diff #46645)	I think that in these architectures we pay additional cycle for switching from INT to FP. Can we use movd?
test/CodeGen/X86/merge-consecutive-loads-256.ll
531 ↗	(On Diff #46645)	this instruction (movss) reads 4 bytes from memory. Does it require 4 bytes alignment?

RKSimon added inline comments.Feb 2 2016, 3:07 PM

test/CodeGen/X86/merge-consecutive-loads-128.ll
335 ↗	(On Diff #46645)	This was something I mentioned in the summary - adding domain support for MOVSS/MOVD is straightforward but has a knock on effect on a lot of tests, which would need some tests modifying to keep to the original domain and others we'd let switch. If you think its worthwhile I'll start looking at this more seriously?
test/CodeGen/X86/merge-consecutive-loads-256.ll
531 ↗	(On Diff #46645)	Not unless SSE/AVX alignment checks are enabled - AFAICT llvm assumes they aren't. We are using the alignment of the base pointer, so lowering of the consecutive load is being driven from that.

RKSimon mentioned this in rL259635: [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to….Feb 3 2016, 1:46 AM

Converted loads to the integer domain - as Elena said, its the more sensible options for consecutive 32-bit loads.

LGTM

lib/Target/X86/X86ISelLowering.cpp
5651 ↗	(On Diff #46645)	What happens here in 32-bit mode, where i64 is illegal?
5674 ↗	(On Diff #46645)	I think that all these checks are not necessary. (VT.getSizeInBits() >= 128) should be enough.
5678 ↗	(On Diff #46645)	What happens if you don't put hardcoded MVT::f32, but choose between f32 and i32 according to VT ?
lib/Target/X86/X86InstrAVX512.td
3054 ↗	(On Diff #46645)	Why v8i64 is not handled in this patterns?

This revision is now accepted and ready to land.Feb 3 2016, 11:47 PM

Closed by commit rL259796: [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to… (authored by RKSimon). · Explain WhyFeb 4 2016, 8:17 AM

This revision was automatically updated to reflect the committed changes.

Thanks Elena - I've addressed your comments in a couple of addition commits:

Added 32-bit target tests to make sure that the 64-bit integer loads still happen (this is the point of the VZEXT_LOAD op any way).

Added float/integer domain handling and removed the over-zealous IsTypeLegal tests.

Note: the v8i64 VZEXT_LOAD patterns are already handled in another part of X86InstrAVX512.td

RKSimon mentioned this in rL259991: [X86][SSE] Don't replace an existing 32-bit load with its duplicate.Feb 6 2016, 7:41 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

49 lines

X86InstrAVX512.td

12 lines

test/

CodeGen/

X86/

merge-consecutive-loads-128.ll

43 lines

merge-consecutive-loads-256.ll

76 lines

merge-consecutive-loads-512.ll

45 lines

Diff 46922

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,595 Lines • ▼ Show 20 Lines	if (LoadMask[i]) {
break;		break;
}		}
} else if (ZeroMask[i]) {		} else if (ZeroMask[i]) {
IsConsecutiveLoad = false;		IsConsecutiveLoad = false;
break;		break;
}		}
}		}

		auto CreateLoad = [&DAG, &DL](EVT VT, LoadSDNode *LDBase) {
		SDValue NewLd = DAG.getLoad(VT, DL, LDBase->getChain(),
		LDBase->getBasePtr(), LDBase->getPointerInfo(),
		LDBase->isVolatile(), LDBase->isNonTemporal(),
		LDBase->isInvariant(), LDBase->getAlignment());

		if (LDBase->hasAnyUseOfValue(1)) {
		SDValue NewChain =
		DAG.getNode(ISD::TokenFactor, DL, MVT::Other, SDValue(LDBase, 1),
		SDValue(NewLd.getNode(), 1));
		DAG.ReplaceAllUsesOfValueWith(SDValue(LDBase, 1), NewChain);
		DAG.UpdateNodeOperands(NewChain.getNode(), SDValue(LDBase, 1),
		SDValue(NewLd.getNode(), 1));
		}

		return NewLd;
		};

// LOAD - all consecutive load/undefs (must start/end with a load).		// LOAD - all consecutive load/undefs (must start/end with a load).
// If we have found an entire vector of loads and undefs, then return a large		// If we have found an entire vector of loads and undefs, then return a large
// load of the entire vector width starting at the base pointer.		// load of the entire vector width starting at the base pointer.
if (IsConsecutiveLoad && FirstLoadedElt == 0 &&		if (IsConsecutiveLoad && FirstLoadedElt == 0 &&
LastLoadedElt == (int)(NumElems - 1) && ZeroMask.none()) {		LastLoadedElt == (int)(NumElems - 1) && ZeroMask.none()) {
assert(LDBase && "Did not find base load for merging consecutive loads");		assert(LDBase && "Did not find base load for merging consecutive loads");
EVT EltVT = LDBase->getValueType(0);		EVT EltVT = LDBase->getValueType(0);
// Ensure that the input vector size for the merged loads matches the		// Ensure that the input vector size for the merged loads matches the
// cumulative size of the input elements.		// cumulative size of the input elements.
if (VT.getSizeInBits() != EltVT.getSizeInBits() * NumElems)		if (VT.getSizeInBits() != EltVT.getSizeInBits() * NumElems)
return SDValue();		return SDValue();

if (isAfterLegalize && !TLI.isOperationLegal(ISD::LOAD, VT))		if (isAfterLegalize && !TLI.isOperationLegal(ISD::LOAD, VT))
return SDValue();		return SDValue();

SDValue NewLd = SDValue();		return CreateLoad(VT, LDBase);

NewLd = DAG.getLoad(VT, DL, LDBase->getChain(), LDBase->getBasePtr(),
LDBase->getPointerInfo(), LDBase->isVolatile(),
LDBase->isNonTemporal(), LDBase->isInvariant(),
LDBase->getAlignment());

if (LDBase->hasAnyUseOfValue(1)) {
SDValue NewChain =
DAG.getNode(ISD::TokenFactor, DL, MVT::Other, SDValue(LDBase, 1),
SDValue(NewLd.getNode(), 1));
DAG.ReplaceAllUsesOfValueWith(SDValue(LDBase, 1), NewChain);
DAG.UpdateNodeOperands(NewChain.getNode(), SDValue(LDBase, 1),
SDValue(NewLd.getNode(), 1));
}

return NewLd;
}		}

int LoadSize =		int LoadSize =
(1 + LastLoadedElt - FirstLoadedElt) * LDBaseVT.getStoreSizeInBits();		(1 + LastLoadedElt - FirstLoadedElt) * LDBaseVT.getStoreSizeInBits();

// VZEXT_LOAD - consecutive load/undefs followed by zeros/undefs.		// VZEXT_LOAD - consecutive load/undefs followed by zeros/undefs.
if (IsConsecutiveLoad && FirstLoadedElt == 0 && LoadSize == 64 &&		if (IsConsecutiveLoad && FirstLoadedElt == 0 && LoadSize == 64 &&
((VT.is128BitVector() && TLI.isTypeLegal(MVT::v2i64)) \|\|		((VT.is128BitVector() && TLI.isTypeLegal(MVT::v2i64)) \|\|
Show All 18 Lines	if (LDBase->hasAnyUseOfValue(1)) {
SDValue(ResNode.getNode(), 1));		SDValue(ResNode.getNode(), 1));
DAG.ReplaceAllUsesOfValueWith(SDValue(LDBase, 1), NewChain);		DAG.ReplaceAllUsesOfValueWith(SDValue(LDBase, 1), NewChain);
DAG.UpdateNodeOperands(NewChain.getNode(), SDValue(LDBase, 1),		DAG.UpdateNodeOperands(NewChain.getNode(), SDValue(LDBase, 1),
SDValue(ResNode.getNode(), 1));		SDValue(ResNode.getNode(), 1));
}		}

return DAG.getBitcast(VT, ResNode);		return DAG.getBitcast(VT, ResNode);
}		}

		// VZEXT_MOVL - consecutive 32-bit load/undefs followed by zeros/undefs.
		if (IsConsecutiveLoad && FirstLoadedElt == 0 && LoadSize == 32 &&
		((VT.is128BitVector() && TLI.isTypeLegal(MVT::v4i32)) \|\|
		(VT.is256BitVector() && TLI.isTypeLegal(MVT::v8i32)) \|\|
		(VT.is512BitVector() && TLI.isTypeLegal(MVT::v16i32)))) {
		MVT VecVT = MVT::getVectorVT(MVT::i32, VT.getSizeInBits() / 32);
		SDValue V = CreateLoad(MVT::i32, LDBase);
		V = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, V);
		V = DAG.getNode(X86ISD::VZEXT_MOVL, DL, VecVT, V);
		return DAG.getBitcast(VT, V);
		}

return SDValue();		return SDValue();
}		}

/// LowerVectorBroadcast - Attempt to use the vbroadcast instruction		/// LowerVectorBroadcast - Attempt to use the vbroadcast instruction
/// to generate a splat value for the following cases:		/// to generate a splat value for the following cases:
/// 1. A splat BUILD_VECTOR which uses a single scalar load, or a constant.		/// 1. A splat BUILD_VECTOR which uses a single scalar load, or a constant.
/// 2. A splat shuffle which uses a scalar_to_vector node which comes from		/// 2. A splat shuffle which uses a scalar_to_vector node which comes from
/// a scalar load, or a constant.		/// a scalar load, or a constant.
▲ Show 20 Lines • Show All 23,652 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,040 Lines • ▼ Show 20 Lines	def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
(v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),		(v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrm addr:$src), sub_xmm)>;
def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,		def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
(v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),		(v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,		def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
(v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),		(v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;		(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;

		// Represent the same patterns above but in the form they appear for
		// 512-bit types
		def : Pat<(v16i32 (X86vzmovl (insert_subvector undef,
		(v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),
		(SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrm addr:$src), sub_xmm)>;
		def : Pat<(v16f32 (X86vzmovl (insert_subvector undef,
		(v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),
		(SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
		def : Pat<(v8f64 (X86vzmovl (insert_subvector undef,
		(v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),
		(SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;
}		}
def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,		def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
(v4f32 (scalar_to_vector FR32X:$src)), (iPTR 0)))),		(v4f32 (scalar_to_vector FR32X:$src)), (iPTR 0)))),
(SUBREG_TO_REG (i32 0), (v4f32 (VMOVSSZrr (v4f32 (V_SET0)),		(SUBREG_TO_REG (i32 0), (v4f32 (VMOVSSZrr (v4f32 (V_SET0)),
FR32X:$src)), sub_xmm)>;		FR32X:$src)), sub_xmm)>;
def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,		def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
(v2f64 (scalar_to_vector FR64X:$src)), (iPTR 0)))),		(v2f64 (scalar_to_vector FR64X:$src)), (iPTR 0)))),
(SUBREG_TO_REG (i64 0), (v2f64 (VMOVSDZrr (v2f64 (V_SET0)),		(SUBREG_TO_REG (i64 0), (v2f64 (VMOVSDZrr (v2f64 (V_SET0)),
▲ Show 20 Lines • Show All 4,611 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll

Show First 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	; X32-SSE-NEXT: retl
%res5 = insertelement <8 x i16> %res4, i16 %val5, i32 5		%res5 = insertelement <8 x i16> %res4, i16 %val5, i32 5
%res7 = insertelement <8 x i16> %res5, i16 %val7, i32 7		%res7 = insertelement <8 x i16> %res5, i16 %val7, i32 7
ret <8 x i16> %res7		ret <8 x i16> %res7
}		}

define <8 x i16> @merge_8i16_i16_34uuuuuu(i16* %ptr) nounwind uwtable noinline ssp {		define <8 x i16> @merge_8i16_i16_34uuuuuu(i16* %ptr) nounwind uwtable noinline ssp {
; SSE-LABEL: merge_8i16_i16_34uuuuuu:		; SSE-LABEL: merge_8i16_i16_34uuuuuu:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pinsrw $0, 6(%rdi), %xmm0		; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: pinsrw $1, 8(%rdi), %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: merge_8i16_i16_34uuuuuu:		; AVX-LABEL: merge_8i16_i16_34uuuuuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpinsrw $0, 6(%rdi), %xmm0, %xmm0		; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: vpinsrw $1, 8(%rdi), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE-LABEL: merge_8i16_i16_34uuuuuu:		; X32-SSE-LABEL: merge_8i16_i16_34uuuuuu:
; X32-SSE: # BB#0:		; X32-SSE: # BB#0:
; X32-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE-NEXT: pinsrw $0, 6(%eax), %xmm0		; X32-SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-SSE-NEXT: pinsrw $1, 8(%eax), %xmm0
; X32-SSE-NEXT: retl		; X32-SSE-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 3		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 3
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 4		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 4
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%res0 = insertelement <8 x i16> undef, i16 %val0, i32 0		%res0 = insertelement <8 x i16> undef, i16 %val0, i32 0
%res1 = insertelement <8 x i16> %res0, i16 %val1, i32 1		%res1 = insertelement <8 x i16> %res0, i16 %val1, i32 1
ret <8 x i16> %res1		ret <8 x i16> %res1
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	; X32-SSE-NEXT: retl
%resB = insertelement <16 x i8> %resA, i8 %valB, i32 11		%resB = insertelement <16 x i8> %resA, i8 %valB, i32 11
%resC = insertelement <16 x i8> %resB, i8 %valC, i32 12		%resC = insertelement <16 x i8> %resB, i8 %valC, i32 12
%resD = insertelement <16 x i8> %resC, i8 %valD, i32 13		%resD = insertelement <16 x i8> %resC, i8 %valD, i32 13
%resF = insertelement <16 x i8> %resD, i8 %valF, i32 15		%resF = insertelement <16 x i8> %resD, i8 %valF, i32 15
ret <16 x i8> %resF		ret <16 x i8> %resF
}		}

define <16 x i8> @merge_16i8_i8_01u3uuzzuuuuuzzz(i8* %ptr) nounwind uwtable noinline ssp {		define <16 x i8> @merge_16i8_i8_01u3uuzzuuuuuzzz(i8* %ptr) nounwind uwtable noinline ssp {
; SSE2-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; SSE-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; SSE2: # BB#0:		; SSE: # BB#0:
; SSE2-NEXT: movzbl (%rdi), %eax		; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE2-NEXT: movzbl 1(%rdi), %ecx		; SSE-NEXT: retq
; SSE2-NEXT: shll $8, %ecx
; SSE2-NEXT: orl %eax, %ecx
; SSE2-NEXT: pxor %xmm0, %xmm0
; SSE2-NEXT: pinsrw $0, %ecx, %xmm0
; SSE2-NEXT: movzbl 3(%rdi), %eax
; SSE2-NEXT: shll $8, %eax
; SSE2-NEXT: pinsrw $1, %eax, %xmm0
; SSE2-NEXT: retq
;
; SSE41-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; SSE41: # BB#0:
; SSE41-NEXT: pxor %xmm0, %xmm0
; SSE41-NEXT: pinsrb $0, (%rdi), %xmm0
; SSE41-NEXT: pinsrb $1, 1(%rdi), %xmm0
; SSE41-NEXT: pinsrb $3, 3(%rdi), %xmm0
; SSE41-NEXT: retq
;		;
; AVX-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; AVX-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: vpinsrb $0, (%rdi), %xmm0, %xmm0
; AVX-NEXT: vpinsrb $1, 1(%rdi), %xmm0, %xmm0
; AVX-NEXT: vpinsrb $3, 3(%rdi), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-SSE-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:		; X32-SSE-LABEL: merge_16i8_i8_01u3uuzzuuuuuzzz:
; X32-SSE: # BB#0:		; X32-SSE: # BB#0:
; X32-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE-NEXT: pxor %xmm0, %xmm0		; X32-SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-SSE-NEXT: pinsrb $0, (%eax), %xmm0
; X32-SSE-NEXT: pinsrb $1, 1(%eax), %xmm0
; X32-SSE-NEXT: pinsrb $3, 3(%eax), %xmm0
; X32-SSE-NEXT: retl		; X32-SSE-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 3		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 3
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <16 x i8> undef, i8 %val0, i32 0		%res0 = insertelement <16 x i8> undef, i8 %val0, i32 0
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll

Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	; X32-AVX-NEXT: retl
%res2 = insertelement <8 x i32> %res0, i32 %val2, i32 2		%res2 = insertelement <8 x i32> %res0, i32 %val2, i32 2
%res4 = insertelement <8 x i32> %res2, i32 %val4, i32 4		%res4 = insertelement <8 x i32> %res2, i32 %val4, i32 4
%res5 = insertelement <8 x i32> %res4, i32 0, i32 5		%res5 = insertelement <8 x i32> %res4, i32 0, i32 5
%res7 = insertelement <8 x i32> %res5, i32 %val7, i32 7		%res7 = insertelement <8 x i32> %res5, i32 %val7, i32 7
ret <8 x i32> %res7		ret <8 x i32> %res7
}		}

define <16 x i16> @merge_16i16_i16_89zzzuuuuuuuuuuuz(i16* %ptr) nounwind uwtable noinline ssp {		define <16 x i16> @merge_16i16_i16_89zzzuuuuuuuuuuuz(i16* %ptr) nounwind uwtable noinline ssp {
; AVX1-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:		; AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:
; AVX1: # BB#0:		; AVX: # BB#0:
; AVX1-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX1-NEXT: vpinsrw $0, 16(%rdi), %xmm0, %xmm1		; AVX-NEXT: retq
; AVX1-NEXT: vpinsrw $1, 18(%rdi), %xmm1, %xmm1
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: retq
;
; AVX2-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:
; AVX2: # BB#0:
; AVX2-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $0, 16(%rdi), %xmm0, %xmm1
; AVX2-NEXT: vpinsrw $1, 18(%rdi), %xmm1, %xmm1
; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: retq
;
; AVX512F-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:
; AVX512F: # BB#0:
; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX512F-NEXT: vpinsrw $0, 16(%rdi), %xmm0, %xmm1
; AVX512F-NEXT: vpinsrw $1, 18(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512F-NEXT: retq
;		;
; X32-AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:		; X32-AVX-LABEL: merge_16i16_i16_89zzzuuuuuuuuuuuz:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vpxor %xmm0, %xmm0, %xmm0		; X32-AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX-NEXT: vpinsrw $0, 16(%eax), %xmm0, %xmm1
; X32-AVX-NEXT: vpinsrw $1, 18(%eax), %xmm1, %xmm1
; X32-AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 8		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 8
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 9		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 9
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%res0 = insertelement <16 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <16 x i16> undef, i16 %val0, i16 0
%res1 = insertelement <16 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <16 x i16> %res0, i16 %val1, i16 1
%res2 = insertelement <16 x i16> %res1, i16 0, i16 2		%res2 = insertelement <16 x i16> %res1, i16 0, i16 2
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	; X32-AVX-NEXT: retl
%resE = insertelement <16 x i16> %resD, i16 %valE, i16 14		%resE = insertelement <16 x i16> %resD, i16 %valE, i16 14
%resF = insertelement <16 x i16> %resE, i16 %valF, i16 15		%resF = insertelement <16 x i16> %resE, i16 %valF, i16 15
ret <16 x i16> %resF		ret <16 x i16> %resF
}		}

define <32 x i8> @merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {		define <32 x i8> @merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {
; AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:		; AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpinsrb $0, 4(%rdi), %xmm0, %xmm0		; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX-NEXT: vpinsrb $1, 5(%rdi), %xmm0, %xmm0
; AVX-NEXT: vpinsrb $3, 7(%rdi), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X32-AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:		; X32-AVX-LABEL: merge_32i8_i8_45u7uuuuuuuuuuuuuuuuuuuuuuuuuuuu:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vpinsrb $0, 4(%eax), %xmm0, %xmm0		; X32-AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX-NEXT: vpinsrb $1, 5(%eax), %xmm0, %xmm0
; X32-AVX-NEXT: vpinsrb $3, 7(%eax), %xmm0, %xmm0
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 4		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 4
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 5		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 5
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 7		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 7
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0		%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0
%res1 = insertelement <32 x i8> %res0, i8 %val1, i8 1		%res1 = insertelement <32 x i8> %res0, i8 %val1, i8 1
%res3 = insertelement <32 x i8> %res1, i8 %val3, i8 3		%res3 = insertelement <32 x i8> %res1, i8 %val3, i8 3
ret <32 x i8> %res3		ret <32 x i8> %res3
}		}

define <32 x i8> @merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {		define <32 x i8> @merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu(i8* %ptr) nounwind uwtable noinline ssp {
; AVX1-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX1: # BB#0:		; AVX: # BB#0:
; AVX1-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX1-NEXT: vpinsrb $0, 2(%rdi), %xmm0, %xmm1		; AVX-NEXT: retq
; AVX1-NEXT: vpinsrb $1, 3(%rdi), %xmm1, %xmm1
; AVX1-NEXT: vpinsrb $3, 5(%rdi), %xmm1, %xmm1
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: retq
;
; AVX2-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX2: # BB#0:
; AVX2-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX2-NEXT: vpinsrb $0, 2(%rdi), %xmm0, %xmm1
; AVX2-NEXT: vpinsrb $1, 3(%rdi), %xmm1, %xmm1
; AVX2-NEXT: vpinsrb $3, 5(%rdi), %xmm1, %xmm1
; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: retq
;
; AVX512F-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX512F: # BB#0:
; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX512F-NEXT: vpinsrb $0, 2(%rdi), %xmm0, %xmm1
; AVX512F-NEXT: vpinsrb $1, 3(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vpinsrb $3, 5(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512F-NEXT: retq
;		;
; X32-AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; X32-AVX-LABEL: merge_32i8_i8_23u5uuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; X32-AVX: # BB#0:		; X32-AVX: # BB#0:
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vpxor %xmm0, %xmm0, %xmm0		; X32-AVX-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX-NEXT: vpinsrb $0, 2(%eax), %xmm0, %xmm1
; X32-AVX-NEXT: vpinsrb $1, 3(%eax), %xmm1, %xmm1
; X32-AVX-NEXT: vpinsrb $3, 5(%eax), %xmm1, %xmm1
; X32-AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; X32-AVX-NEXT: retl		; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 2		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 2
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 3		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 3
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 5		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 5
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0		%res0 = insertelement <32 x i8> undef, i8 %val0, i8 0
%res1 = insertelement <32 x i8> %res0, i8 %val1, i8 1		%res1 = insertelement <32 x i8> %res0, i8 %val1, i8 1
%res3 = insertelement <32 x i8> %res1, i8 %val3, i8 3		%res3 = insertelement <32 x i8> %res1, i8 %val3, i8 3
%resE = insertelement <32 x i8> %res3, i8 0, i8 14		%resE = insertelement <32 x i8> %res3, i8 0, i8 14
%resF = insertelement <32 x i8> %resE, i8 0, i8 15		%resF = insertelement <32 x i8> %resE, i8 0, i8 15
%resG = insertelement <32 x i8> %resF, i8 0, i8 16		%resG = insertelement <32 x i8> %resF, i8 0, i8 16
%resH = insertelement <32 x i8> %resG, i8 0, i8 17		%resH = insertelement <32 x i8> %resG, i8 0, i8 17
ret <32 x i8> %resH		ret <32 x i8> %resH
}		}

llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-512.ll

Show First 20 Lines • Show All 564 Lines • ▼ Show 20 Lines	; X32-AVX512F-NEXT: retl
%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1
%res3 = insertelement <32 x i16> %res1, i16 %val3, i16 3		%res3 = insertelement <32 x i16> %res1, i16 %val3, i16 3
ret <32 x i16> %res3		ret <32 x i16> %res3
}		}

define <32 x i16> @merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {		define <32 x i16> @merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu(i16* %ptr) nounwind uwtable noinline ssp {
; AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512F-NEXT: vpinsrw $0, 4(%rdi), %xmm0, %xmm1		; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: vpinsrw $1, 6(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512F-NEXT: vpxor %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; AVX512BW-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; AVX512BW: # BB#0:		; AVX512BW: # BB#0:
; AVX512BW-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX512BW-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512BW-NEXT: vpinsrw $0, 4(%rdi), %xmm0, %xmm1
; AVX512BW-NEXT: vpinsrw $1, 6(%rdi), %xmm1, %xmm1
; AVX512BW-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512BW-NEXT: vpxor %ymm1, %ymm1, %ymm1
; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:		; X32-AVX512F-LABEL: merge_32i16_i16_23uzuuuuuuuuuuzzzzuuuuuuuuuuuuuu:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0		; X32-AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX512F-NEXT: vpinsrw $0, 4(%eax), %xmm0, %xmm1		; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: vpinsrw $1, 6(%eax), %xmm1, %xmm1
; X32-AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; X32-AVX512F-NEXT: vpxor %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 2		%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 2
%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 3		%ptr1 = getelementptr inbounds i16, i16* %ptr, i64 3
%val0 = load i16, i16* %ptr0		%val0 = load i16, i16* %ptr0
%val1 = load i16, i16* %ptr1		%val1 = load i16, i16* %ptr1
%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0		%res0 = insertelement <32 x i16> undef, i16 %val0, i16 0
%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1		%res1 = insertelement <32 x i16> %res0, i16 %val1, i16 1
%res3 = insertelement <32 x i16> %res1, i16 0, i16 3		%res3 = insertelement <32 x i16> %res1, i16 0, i16 3
Show All 40 Lines	; X32-AVX512F-NEXT: retl
%res17 = insertelement <64 x i8> %res16, i8 0, i8 17		%res17 = insertelement <64 x i8> %res16, i8 0, i8 17
%res63 = insertelement <64 x i8> %res17, i8 0, i8 63		%res63 = insertelement <64 x i8> %res17, i8 0, i8 63
ret <64 x i8> %res63		ret <64 x i8> %res63
}		}

define <64 x i8> @merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz(i8* %ptr) nounwind uwtable noinline ssp {		define <64 x i8> @merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz(i8* %ptr) nounwind uwtable noinline ssp {
; AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; AVX512F: # BB#0:		; AVX512F: # BB#0:
; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512F-NEXT: vpinsrb $0, 1(%rdi), %xmm0, %xmm1		; AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: vpinsrb $1, 2(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vpinsrb $3, 4(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512F-NEXT: vpxor %ymm1, %ymm1, %ymm1
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; AVX512BW-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; AVX512BW: # BB#0:		; AVX512BW: # BB#0:
; AVX512BW-NEXT: vpxor %xmm0, %xmm0, %xmm0		; AVX512BW-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX512BW-NEXT: vpinsrb $0, 1(%rdi), %xmm0, %xmm1
; AVX512BW-NEXT: vpinsrb $1, 2(%rdi), %xmm1, %xmm1
; AVX512BW-NEXT: vpinsrb $3, 4(%rdi), %xmm1, %xmm1
; AVX512BW-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512BW-NEXT: vpxor %ymm1, %ymm1, %ymm1
; AVX512BW-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; X32-AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:		; X32-AVX512F-LABEL: merge_64i8_i8_12u4uuuuuuuuuuzzzzuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuz:
; X32-AVX512F: # BB#0:		; X32-AVX512F: # BB#0:
; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-AVX512F-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0		; X32-AVX512F-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-AVX512F-NEXT: vpinsrb $0, 1(%eax), %xmm0, %xmm1		; X32-AVX512F-NEXT: vxorps %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: vpinsrb $1, 2(%eax), %xmm1, %xmm1
; X32-AVX512F-NEXT: vpinsrb $3, 4(%eax), %xmm1, %xmm1
; X32-AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; X32-AVX512F-NEXT: vpxor %ymm1, %ymm1, %ymm1
; X32-AVX512F-NEXT: retl		; X32-AVX512F-NEXT: retl
%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 1		%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 1
%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 2		%ptr1 = getelementptr inbounds i8, i8* %ptr, i64 2
%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 4		%ptr3 = getelementptr inbounds i8, i8* %ptr, i64 4
%val0 = load i8, i8* %ptr0		%val0 = load i8, i8* %ptr0
%val1 = load i8, i8* %ptr1		%val1 = load i8, i8* %ptr1
%val3 = load i8, i8* %ptr3		%val3 = load i8, i8* %ptr3
%res0 = insertelement <64 x i8> undef, i8 %val0, i8 0		%res0 = insertelement <64 x i8> undef, i8 %val0, i8 0
Show All 9 Lines