This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.cpp
-
ARMInstrMVE.td
-
test/CodeGen/Thumb2/
-
CodeGen/
-
Thumb2/
-
mve-ldst-offset.ll
-
mve-ldst-postinc.ll
-
mve-ldst-preinc.ll
-
mve-widen-narrow.ll

Differential D65580

[ARM] Tighten up VLDRH.32 with low alignments
ClosedPublic

Authored by dmgreen on Aug 1 2019, 7:46 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
samparker
SjoerdMeijer
simon_tatham
ostannard

Commits

rG1becefd3f796: [ARM] Tighten up VLDRH.32 with low alignments
rL368256: [ARM] Tighten up VLDRH.32 with low alignments

Summary

VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift.

Diff Detail

Repository: rL LLVM

Event Timeline

dmgreen created this revision.Aug 1 2019, 7:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2019, 7:46 AM

Herald added subscribers: hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

dmgreen mentioned this in D63840: [ARM] Add support for MVE pre and post inc loads and stores..Aug 5 2019, 10:28 AM

dmgreen added a child revision: D65583: [ARM] MVE big endian loads/stores.

samparker added inline comments.Aug 7 2019, 1:06 AM

llvm/test/CodeGen/Thumb2/mve-ldst-offset.ll
756 ↗	(On Diff #212793)	I am so confused by this, can you explain it for me please?

simon_tatham added inline comments.Aug 7 2019, 2:06 AM

llvm/test/CodeGen/Thumb2/mve-ldst-offset.ll
756 ↗	(On Diff #212793)	(Drive-by comment since this crossed my inbox) I think what's going on here is: `VLDRH.S32` means: load 8 bytes of memory, regard them as 4 16-bit halfwords (`H`), and sign-extend each one into a 32-bit lane (`S32`) of the output vector register. But it requires alignment of at least 2 on the memory it's loading from. So in order to apply it to 8 bytes starting at an odd address, the generated code is copying the 8 source bytes to an aligned 8-byte stack slot, and then pointing the `VLDRH.S32` at that instead. I assume this run of `llc` is in a mode where it assumes unaligned access support on the ordinary `LDR` instruction has been enabled in the hardware configuration. (If I remember, that's the default – to generate code compatible with a CPU that has that turned _off_ you have to say `-mno-unaligned-access` in clang, or whatever llc's equivalent option is.)

samparker added inline comments.Aug 7 2019, 2:28 AM

llvm/test/CodeGen/Thumb2/mve-ldst-offset.ll
756 ↗	(On Diff #212793)	Bah, thanks! For some reason I wasn't thinking about the need to widen, all the loads really threw me.

With my confusion cleared, LGTM.

This revision is now accepted and ready to land.Aug 7 2019, 2:32 AM

Thanks!

llvm/test/CodeGen/Thumb2/mve-ldst-offset.ll
756 ↗	(On Diff #212793)	Yep, this is the default fallback of "align it via the stack and load it again". Its obviously not very efficient, but I don't believe it will often come up (it's only for unaligned 16bit loads). If it does we may be able to do something better, perhaps by splitting out the extend.

Closed by commit rL368256: [ARM] Tighten up VLDRH.32 with low alignments (authored by dmgreen). · Explain WhyAug 7 2019, 11:21 PM

This revision was automatically updated to reflect the committed changes.

gchatelet mentioned this in D82876: [Alignment][NFC] Migrate TargetTransformInfo::allowsMisalignedMemoryAccesses to Align.Jul 1 2020, 12:53 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMISelLowering.cpp

13 lines

ARMInstrMVE.td

38 lines

test/

CodeGen/

Thumb2/

43 lines

20 lines

45 lines

73 lines

Diff 214068

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,060 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned,

// These are for predicates		// These are for predicates
if ((Ty == MVT::v16i1 \|\| Ty == MVT::v8i1 \|\| Ty == MVT::v4i1)) {		if ((Ty == MVT::v16i1 \|\| Ty == MVT::v8i1 \|\| Ty == MVT::v4i1)) {
if (Fast)		if (Fast)
*Fast = true;		*Fast = true;
return true;		return true;
}		}

		// These are for truncated stores/narrowing loads. They are fine so long as
		// the alignment is at least the size of the item being loaded
		if ((Ty == MVT::v4i8 \|\| Ty == MVT::v8i8 \|\| Ty == MVT::v4i16) &&
		Alignment >= VT.getScalarSizeInBits() / 8) {
		if (Fast)
		*Fast = true;
		return true;
		}

if (Ty != MVT::v16i8 && Ty != MVT::v8i16 && Ty != MVT::v8f16 &&		if (Ty != MVT::v16i8 && Ty != MVT::v8i16 && Ty != MVT::v8f16 &&
Ty != MVT::v4i32 && Ty != MVT::v4f32 && Ty != MVT::v2i64 &&		Ty != MVT::v4i32 && Ty != MVT::v4f32 && Ty != MVT::v2i64 &&
Ty != MVT::v2f64 &&		Ty != MVT::v2f64)
// These are for truncated stores
Ty != MVT::v4i8 && Ty != MVT::v8i8 && Ty != MVT::v4i16)
return false;		return false;

if (Subtarget->isLittle()) {		if (Subtarget->isLittle()) {
// In little-endian MVE, the store instructions VSTRB.U8,		// In little-endian MVE, the store instructions VSTRB.U8,
// VSTRH.U16 and VSTRW.U32 all store the vector register in		// VSTRH.U16 and VSTRW.U32 all store the vector register in
// exactly the same format, and differ only in the range of		// exactly the same format, and differ only in the range of
// their immediate offset field and the required alignment.		// their immediate offset field and the required alignment.
//		//
▲ Show 20 Lines • Show All 2,417 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrMVE.td

Show First 20 Lines • Show All 4,833 Lines • ▼ Show 20 Lines	let Predicates = [HasMVEInt, IsBE] in {
def : MVE_unpred_vector_load_typed<v8f16, MVE_VLDRHU16, alignedload16, 1>;		def : MVE_unpred_vector_load_typed<v8f16, MVE_VLDRHU16, alignedload16, 1>;
def : MVE_unpred_vector_load_typed<v4i32, MVE_VLDRWU32, alignedload32, 2>;		def : MVE_unpred_vector_load_typed<v4i32, MVE_VLDRWU32, alignedload32, 2>;
def : MVE_unpred_vector_load_typed<v4f32, MVE_VLDRWU32, alignedload32, 2>;		def : MVE_unpred_vector_load_typed<v4f32, MVE_VLDRWU32, alignedload32, 2>;
}		}


// Widening/Narrowing Loads/Stores		// Widening/Narrowing Loads/Stores

		let MinAlignment = 2 in {
		def truncstorevi16_align2 : PatFrag<(ops node:$val, node:$ptr),
		(truncstorevi16 node:$val, node:$ptr)>;
		}

let Predicates = [HasMVEInt] in {		let Predicates = [HasMVEInt] in {
def : Pat<(truncstorevi8 (v8i16 MQPR:$val), t2addrmode_imm7<1>:$addr),		def : Pat<(truncstorevi8 (v8i16 MQPR:$val), t2addrmode_imm7<0>:$addr),
(MVE_VSTRB16 MQPR:$val, t2addrmode_imm7<1>:$addr)>;		(MVE_VSTRB16 MQPR:$val, t2addrmode_imm7<0>:$addr)>;
def : Pat<(truncstorevi8 (v4i32 MQPR:$val), t2addrmode_imm7<1>:$addr),		def : Pat<(truncstorevi8 (v4i32 MQPR:$val), t2addrmode_imm7<0>:$addr),
(MVE_VSTRB32 MQPR:$val, t2addrmode_imm7<1>:$addr)>;		(MVE_VSTRB32 MQPR:$val, t2addrmode_imm7<0>:$addr)>;
def : Pat<(truncstorevi16 (v4i32 MQPR:$val), t2addrmode_imm7<2>:$addr),		def : Pat<(truncstorevi16_align2 (v4i32 MQPR:$val), t2addrmode_imm7<1>:$addr),
(MVE_VSTRH32 MQPR:$val, t2addrmode_imm7<2>:$addr)>;		(MVE_VSTRH32 MQPR:$val, t2addrmode_imm7<1>:$addr)>;
		}


		let MinAlignment = 2 in {
		def extloadvi16_align2 : PatFrag<(ops node:$ptr), (extloadvi16 node:$ptr)>;
		def sextloadvi16_align2 : PatFrag<(ops node:$ptr), (sextloadvi16 node:$ptr)>;
		def zextloadvi16_align2 : PatFrag<(ops node:$ptr), (zextloadvi16 node:$ptr)>;
}		}

multiclass MVEExtLoad<string DestLanes, string DestElemBits,		multiclass MVEExtLoad<string DestLanes, string DestElemBits,
string SrcElemBits, string SrcElemType,		string SrcElemBits, string SrcElemType,
Operand am> {		string Align, Operand am> {
def _Any : Pat<(!cast<ValueType>("v" # DestLanes # "i" # DestElemBits)		def _Any : Pat<(!cast<ValueType>("v" # DestLanes # "i" # DestElemBits)
(!cast<PatFrag>("extloadvi" # SrcElemBits) am:$addr)),		(!cast<PatFrag>("extloadvi" # SrcElemBits # Align) am:$addr)),
(!cast<Instruction>("MVE_VLDR" # SrcElemType # "U" # DestElemBits)		(!cast<Instruction>("MVE_VLDR" # SrcElemType # "U" # DestElemBits)
am:$addr)>;		am:$addr)>;
def _Z : Pat<(!cast<ValueType>("v" # DestLanes # "i" # DestElemBits)		def _Z : Pat<(!cast<ValueType>("v" # DestLanes # "i" # DestElemBits)
(!cast<PatFrag>("zextloadvi" # SrcElemBits) am:$addr)),		(!cast<PatFrag>("zextloadvi" # SrcElemBits # Align) am:$addr)),
(!cast<Instruction>("MVE_VLDR" # SrcElemType # "U" # DestElemBits)		(!cast<Instruction>("MVE_VLDR" # SrcElemType # "U" # DestElemBits)
am:$addr)>;		am:$addr)>;
def _S : Pat<(!cast<ValueType>("v" # DestLanes # "i" # DestElemBits)		def _S : Pat<(!cast<ValueType>("v" # DestLanes # "i" # DestElemBits)
(!cast<PatFrag>("sextloadvi" # SrcElemBits) am:$addr)),		(!cast<PatFrag>("sextloadvi" # SrcElemBits # Align) am:$addr)),
(!cast<Instruction>("MVE_VLDR" # SrcElemType # "S" # DestElemBits)		(!cast<Instruction>("MVE_VLDR" # SrcElemType # "S" # DestElemBits)
am:$addr)>;		am:$addr)>;
}		}

let Predicates = [HasMVEInt] in {		let Predicates = [HasMVEInt] in {
defm : MVEExtLoad<"4", "32", "8", "B", t2addrmode_imm7<1>>;		defm : MVEExtLoad<"4", "32", "8", "B", "", t2addrmode_imm7<0>>;
defm : MVEExtLoad<"8", "16", "8", "B", t2addrmode_imm7<1>>;		defm : MVEExtLoad<"8", "16", "8", "B", "", t2addrmode_imm7<0>>;
defm : MVEExtLoad<"4", "32", "16", "H", t2addrmode_imm7<2>>;		defm : MVEExtLoad<"4", "32", "16", "H", "_align2", t2addrmode_imm7<1>>;
}		}


// Bit convert patterns		// Bit convert patterns

let Predicates = [HasMVEInt] in {		let Predicates = [HasMVEInt] in {
def : Pat<(v2f64 (bitconvert (v2i64 QPR:$src))), (v2f64 QPR:$src)>;		def : Pat<(v2f64 (bitconvert (v2i64 QPR:$src))), (v2f64 QPR:$src)>;
def : Pat<(v2i64 (bitconvert (v2f64 QPR:$src))), (v2i64 QPR:$src)>;		def : Pat<(v2i64 (bitconvert (v2f64 QPR:$src))), (v2i64 QPR:$src)>;
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb2/mve-ldst-offset.ll

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %x		ret i8* %x
}		}

define i8* @ldrhu32_2(i8* %x, i8* %y) {		define i8* @ldrhu32_2(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhu32_2:		; CHECK-LABEL: ldrhu32_2:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #2		; CHECK-NEXT: vldrh.u32 q0, [r0, #2]
; CHECK-NEXT: vldrh.u32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 2		%z = getelementptr inbounds i8, i8* %x, i32 2
%0 = bitcast i8* %z to <4 x i16>*		%0 = bitcast i8* %z to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 2		%1 = load <4 x i16>, <4 x i16>* %0, align 2
%2 = zext <4 x i16> %1 to <4 x i32>		%2 = zext <4 x i16> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %x		ret i8* %x
}		}

define i8* @ldrhs32_2(i8* %x, i8* %y) {		define i8* @ldrhs32_2(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhs32_2:		; CHECK-LABEL: ldrhs32_2:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #2		; CHECK-NEXT: vldrh.s32 q0, [r0, #2]
; CHECK-NEXT: vldrh.s32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 2		%z = getelementptr inbounds i8, i8* %x, i32 2
%0 = bitcast i8* %z to <4 x i16>*		%0 = bitcast i8* %z to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 2		%1 = load <4 x i16>, <4 x i16>* %0, align 2
%2 = sext <4 x i16> %1 to <4 x i32>		%2 = sext <4 x i16> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %x		ret i8* %x
}		}

define i8* @ldrbu32_3(i8* %x, i8* %y) {		define i8* @ldrbu32_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbu32_3:		; CHECK-LABEL: ldrbu32_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #3		; CHECK-NEXT: vldrb.u32 q0, [r0, #3]
; CHECK-NEXT: vldrb.u32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <4 x i8>*		%0 = bitcast i8* %z to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 1		%1 = load <4 x i8>, <4 x i8>* %0, align 1
%2 = zext <4 x i8> %1 to <4 x i32>		%2 = zext <4 x i8> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %x		ret i8* %x
}		}

define i8* @ldrbs32_3(i8* %x, i8* %y) {		define i8* @ldrbs32_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbs32_3:		; CHECK-LABEL: ldrbs32_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #3		; CHECK-NEXT: vldrb.s32 q0, [r0, #3]
; CHECK-NEXT: vldrb.s32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <4 x i8>*		%0 = bitcast i8* %z to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 1		%1 = load <4 x i8>, <4 x i8>* %0, align 1
%2 = sext <4 x i8> %1 to <4 x i32>		%2 = sext <4 x i8> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %2, <8 x i16>* %3, align 2		store <8 x i16> %2, <8 x i16>* %3, align 2
ret i8* %x		ret i8* %x
}		}

define i8* @ldrbu16_3(i8* %x, i8* %y) {		define i8* @ldrbu16_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbu16_3:		; CHECK-LABEL: ldrbu16_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #3		; CHECK-NEXT: vldrb.u16 q0, [r0, #3]
; CHECK-NEXT: vldrb.u16 q0, [r2]
; CHECK-NEXT: vstrh.16 q0, [r1]		; CHECK-NEXT: vstrh.16 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <8 x i8>*		%0 = bitcast i8* %z to <8 x i8>*
%1 = load <8 x i8>, <8 x i8>* %0, align 1		%1 = load <8 x i8>, <8 x i8>* %0, align 1
%2 = zext <8 x i8> %1 to <8 x i16>		%2 = zext <8 x i8> %1 to <8 x i16>
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %2, <8 x i16>* %3, align 2		store <8 x i16> %2, <8 x i16>* %3, align 2
ret i8* %x		ret i8* %x
}		}

define i8* @ldrbs16_3(i8* %x, i8* %y) {		define i8* @ldrbs16_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbs16_3:		; CHECK-LABEL: ldrbs16_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #3		; CHECK-NEXT: vldrb.s16 q0, [r0, #3]
; CHECK-NEXT: vldrb.s16 q0, [r2]
; CHECK-NEXT: vstrh.16 q0, [r1]		; CHECK-NEXT: vstrh.16 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <8 x i8>*		%0 = bitcast i8* %z to <8 x i8>*
%1 = load <8 x i8>, <8 x i8>* %0, align 1		%1 = load <8 x i8>, <8 x i8>* %0, align 1
%2 = sext <8 x i8> %1 to <8 x i16>		%2 = sext <8 x i8> %1 to <8 x i16>
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %y to <8 x i16>*		%2 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %1, <8 x i16>* %2, align 2		store <8 x i16> %1, <8 x i16>* %2, align 2
ret i8* %x		ret i8* %x
}		}

define i8* @ldrhi32_align1(i8* %x, i8* %y) {		define i8* @ldrhi32_align1(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhi32_align1:		; CHECK-LABEL: ldrhi32_align1:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r2, r0, #3		; CHECK-NEXT: .pad #8
		; CHECK-NEXT: sub sp, #8
		; CHECK-NEXT: ldr.w r3, [r0, #7]
		; CHECK-NEXT: ldr.w r2, [r0, #3]
		; CHECK-NEXT: strd r2, r3, [sp]
		; CHECK-NEXT: mov r2, sp
; CHECK-NEXT: vldrh.s32 q0, [r2]		; CHECK-NEXT: vldrh.s32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <4 x i16>*		%0 = bitcast i8* %z to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 1		%1 = load <4 x i16>, <4 x i16>* %0, align 1
%2 = bitcast i8* %y to <4 x i32>*		%2 = bitcast i8* %y to <4 x i32>*
%3 = sext <4 x i16> %1 to <4 x i32>		%3 = sext <4 x i16> %1 to <4 x i32>
store <4 x i32> %3, <4 x i32>* %2, align 4		store <4 x i32> %3, <4 x i32>* %2, align 4
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	entry:
store <4 x i16> %1, <4 x i16>* %2, align 2		store <4 x i16> %1, <4 x i16>* %2, align 2
ret i8* %y		ret i8* %y
}		}

define i8* @strh32_2(i8* %y, i8* %x) {		define i8* @strh32_2(i8* %y, i8* %x) {
; CHECK-LABEL: strh32_2:		; CHECK-LABEL: strh32_2:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrh.u32 q0, [r1]		; CHECK-NEXT: vldrh.u32 q0, [r1]
; CHECK-NEXT: adds r1, r0, #2		; CHECK-NEXT: vstrh.32 q0, [r0, #2]
; CHECK-NEXT: vstrh.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 2		%z = getelementptr inbounds i8, i8* %y, i32 2
%0 = bitcast i8* %x to <4 x i16>*		%0 = bitcast i8* %x to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 2		%1 = load <4 x i16>, <4 x i16>* %0, align 2
%2 = bitcast i8* %z to <4 x i16>*		%2 = bitcast i8* %z to <4 x i16>*
store <4 x i16> %1, <4 x i16>* %2, align 2		store <4 x i16> %1, <4 x i16>* %2, align 2
ret i8* %y		ret i8* %y
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	entry:
store <4 x i8> %1, <4 x i8>* %2, align 1		store <4 x i8> %1, <4 x i8>* %2, align 1
ret i8* %y		ret i8* %y
}		}

define i8* @strb32_3(i8* %y, i8* %x) {		define i8* @strb32_3(i8* %y, i8* %x) {
; CHECK-LABEL: strb32_3:		; CHECK-LABEL: strb32_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrb.u32 q0, [r1]		; CHECK-NEXT: vldrb.u32 q0, [r1]
; CHECK-NEXT: adds r1, r0, #3		; CHECK-NEXT: vstrb.32 q0, [r0, #3]
; CHECK-NEXT: vstrb.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <4 x i8>*		%0 = bitcast i8* %x to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 1		%1 = load <4 x i8>, <4 x i8>* %0, align 1
%2 = bitcast i8* %z to <4 x i8>*		%2 = bitcast i8* %z to <4 x i8>*
store <4 x i8> %1, <4 x i8>* %2, align 1		store <4 x i8> %1, <4 x i8>* %2, align 1
ret i8* %y		ret i8* %y
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
store <8 x i8> %1, <8 x i8>* %2, align 1		store <8 x i8> %1, <8 x i8>* %2, align 1
ret i8* %y		ret i8* %y
}		}

define i8* @strb16_3(i8* %y, i8* %x) {		define i8* @strb16_3(i8* %y, i8* %x) {
; CHECK-LABEL: strb16_3:		; CHECK-LABEL: strb16_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrb.u16 q0, [r1]		; CHECK-NEXT: vldrb.u16 q0, [r1]
; CHECK-NEXT: adds r1, r0, #3		; CHECK-NEXT: vstrb.16 q0, [r0, #3]
; CHECK-NEXT: vstrb.16 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <8 x i8>*		%0 = bitcast i8* %x to <8 x i8>*
%1 = load <8 x i8>, <8 x i8>* %0, align 1		%1 = load <8 x i8>, <8 x i8>* %0, align 1
%2 = bitcast i8* %z to <8 x i8>*		%2 = bitcast i8* %z to <8 x i8>*
store <8 x i8> %1, <8 x i8>* %2, align 1		store <8 x i8> %1, <8 x i8>* %2, align 1
ret i8* %y		ret i8* %y
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %z to <8 x i16>*		%2 = bitcast i8* %z to <8 x i16>*
store <8 x i16> %1, <8 x i16>* %2, align 1		store <8 x i16> %1, <8 x i16>* %2, align 1
ret i8* %y		ret i8* %y
}		}

define i8* @strhi32_align1(i8* %y, i8* %x) {		define i8* @strhi32_align1(i8* %y, i8* %x) {
; CHECK-LABEL: strhi32_align1:		; CHECK-LABEL: strhi32_align1:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .pad #8
		; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: vldrw.u32 q0, [r1]		; CHECK-NEXT: vldrw.u32 q0, [r1]
; CHECK-NEXT: adds r1, r0, #3		; CHECK-NEXT: mov r1, sp
; CHECK-NEXT: vstrh.32 q0, [r1]		; CHECK-NEXT: vstrh.32 q0, [r1]
		; CHECK-NEXT: ldrd r1, r2, [sp]
		; CHECK-NEXT: str.w r1, [r0, #3]
		; CHECK-NEXT: str.w r2, [r0, #7]
		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <4 x i32>*		%0 = bitcast i8* %x to <4 x i32>*
%1 = load <4 x i32>, <4 x i32>* %0, align 4		%1 = load <4 x i32>, <4 x i32>* %0, align 4
%2 = bitcast i8* %z to <4 x i16>*		%2 = bitcast i8* %z to <4 x i16>*
%3 = trunc <4 x i32> %1 to <4 x i16>		%3 = trunc <4 x i32> %1 to <4 x i16>
store <4 x i16> %3, <4 x i16>* %2, align 1		store <4 x i16> %3, <4 x i16>* %2, align 1
Show All 32 Lines

llvm/trunk/test/CodeGen/Thumb2/mve-ldst-postinc.ll

Show First 20 Lines • Show All 768 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %y to <8 x i16>*		%2 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %1, <8 x i16>* %2, align 2		store <8 x i16> %1, <8 x i16>* %2, align 2
ret i8* %z		ret i8* %z
}		}

define i8* @ldrhi32_align1(i8* %x, i8* %y) {		define i8* @ldrhi32_align1(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhi32_align1:		; CHECK-LABEL: ldrhi32_align1:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrh.s32 q0, [r0]		; CHECK-NEXT: .pad #8
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: sub sp, #8
		; CHECK-NEXT: ldr r3, [r0, #4]
		; CHECK-NEXT: ldr r2, [r0]
		; CHECK-NEXT: adds r0, #3
		; CHECK-NEXT: strd r2, r3, [sp]
		; CHECK-NEXT: mov r2, sp
		; CHECK-NEXT: vldrh.s32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %x to <4 x i16>*		%0 = bitcast i8* %x to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 1		%1 = load <4 x i16>, <4 x i16>* %0, align 1
%2 = bitcast i8* %y to <4 x i32>*		%2 = bitcast i8* %y to <4 x i32>*
%3 = sext <4 x i16> %1 to <4 x i32>		%3 = sext <4 x i16> %1 to <4 x i32>
store <4 x i32> %3, <4 x i32>* %2, align 4		store <4 x i32> %3, <4 x i32>* %2, align 4
▲ Show 20 Lines • Show All 567 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %y to <8 x i16>*		%2 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %1, <8 x i16>* %2, align 1		store <8 x i16> %1, <8 x i16>* %2, align 1
ret i8* %z		ret i8* %z
}		}

define i8* @strhi32_align1(i8* %y, i8* %x) {		define i8* @strhi32_align1(i8* %y, i8* %x) {
; CHECK-LABEL: strhi32_align1:		; CHECK-LABEL: strhi32_align1:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .pad #8
		; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: vldrw.u32 q0, [r1]		; CHECK-NEXT: vldrw.u32 q0, [r1]
; CHECK-NEXT: vstrh.32 q0, [r0]		; CHECK-NEXT: mov r1, sp
		; CHECK-NEXT: vstrh.32 q0, [r1]
		; CHECK-NEXT: ldrd r1, r2, [sp]
		; CHECK-NEXT: str r1, [r0]
		; CHECK-NEXT: str r2, [r0, #4]
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: adds r0, #3
		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <4 x i32>*		%0 = bitcast i8* %x to <4 x i32>*
%1 = load <4 x i32>, <4 x i32>* %0, align 4		%1 = load <4 x i32>, <4 x i32>* %0, align 4
%2 = bitcast i8* %y to <4 x i16>*		%2 = bitcast i8* %y to <4 x i16>*
%3 = trunc <4 x i32> %1 to <4 x i16>		%3 = trunc <4 x i32> %1 to <4 x i16>
store <4 x i16> %3, <4 x i16>* %2, align 1		store <4 x i16> %3, <4 x i16>* %2, align 1
Show All 34 Lines

llvm/trunk/test/CodeGen/Thumb2/mve-ldst-preinc.ll

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %z		ret i8* %z
}		}

define i8* @ldrhu32_2(i8* %x, i8* %y) {		define i8* @ldrhu32_2(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhu32_2:		; CHECK-LABEL: ldrhu32_2:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: vldrh.u32 q0, [r0, #2]
; CHECK-NEXT: adds r0, #2		; CHECK-NEXT: adds r0, #2
; CHECK-NEXT: vldrh.u32 q0, [r0]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 2		%z = getelementptr inbounds i8, i8* %x, i32 2
%0 = bitcast i8* %z to <4 x i16>*		%0 = bitcast i8* %z to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 2		%1 = load <4 x i16>, <4 x i16>* %0, align 2
%2 = zext <4 x i16> %1 to <4 x i32>		%2 = zext <4 x i16> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %z		ret i8* %z
}		}

define i8* @ldrhs32_2(i8* %x, i8* %y) {		define i8* @ldrhs32_2(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhs32_2:		; CHECK-LABEL: ldrhs32_2:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: vldrh.s32 q0, [r0, #2]
; CHECK-NEXT: adds r0, #2		; CHECK-NEXT: adds r0, #2
; CHECK-NEXT: vldrh.s32 q0, [r0]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 2		%z = getelementptr inbounds i8, i8* %x, i32 2
%0 = bitcast i8* %z to <4 x i16>*		%0 = bitcast i8* %z to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 2		%1 = load <4 x i16>, <4 x i16>* %0, align 2
%2 = sext <4 x i16> %1 to <4 x i32>		%2 = sext <4 x i16> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %z		ret i8* %z
}		}

define i8* @ldrbu32_3(i8* %x, i8* %y) {		define i8* @ldrbu32_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbu32_3:		; CHECK-LABEL: ldrbu32_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: vldrb.u32 q0, [r0, #3]
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: vldrb.u32 q0, [r0]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <4 x i8>*		%0 = bitcast i8* %z to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 1		%1 = load <4 x i8>, <4 x i8>* %0, align 1
%2 = zext <4 x i8> %1 to <4 x i32>		%2 = zext <4 x i8> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
store <4 x i32> %2, <4 x i32>* %3, align 4		store <4 x i32> %2, <4 x i32>* %3, align 4
ret i8* %z		ret i8* %z
}		}

define i8* @ldrbs32_3(i8* %x, i8* %y) {		define i8* @ldrbs32_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbs32_3:		; CHECK-LABEL: ldrbs32_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: vldrb.s32 q0, [r0, #3]
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: vldrb.s32 q0, [r0]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <4 x i8>*		%0 = bitcast i8* %z to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 1		%1 = load <4 x i8>, <4 x i8>* %0, align 1
%2 = sext <4 x i8> %1 to <4 x i32>		%2 = sext <4 x i8> %1 to <4 x i32>
%3 = bitcast i8* %y to <4 x i32>*		%3 = bitcast i8* %y to <4 x i32>*
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %2, <8 x i16>* %3, align 2		store <8 x i16> %2, <8 x i16>* %3, align 2
ret i8* %z		ret i8* %z
}		}

define i8* @ldrbu16_3(i8* %x, i8* %y) {		define i8* @ldrbu16_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbu16_3:		; CHECK-LABEL: ldrbu16_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: vldrb.u16 q0, [r0, #3]
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: vldrb.u16 q0, [r0]
; CHECK-NEXT: vstrh.16 q0, [r1]		; CHECK-NEXT: vstrh.16 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <8 x i8>*		%0 = bitcast i8* %z to <8 x i8>*
%1 = load <8 x i8>, <8 x i8>* %0, align 1		%1 = load <8 x i8>, <8 x i8>* %0, align 1
%2 = zext <8 x i8> %1 to <8 x i16>		%2 = zext <8 x i8> %1 to <8 x i16>
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %2, <8 x i16>* %3, align 2		store <8 x i16> %2, <8 x i16>* %3, align 2
ret i8* %z		ret i8* %z
}		}

define i8* @ldrbs16_3(i8* %x, i8* %y) {		define i8* @ldrbs16_3(i8* %x, i8* %y) {
; CHECK-LABEL: ldrbs16_3:		; CHECK-LABEL: ldrbs16_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: vldrb.s16 q0, [r0, #3]
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: vldrb.s16 q0, [r0]
; CHECK-NEXT: vstrh.16 q0, [r1]		; CHECK-NEXT: vstrh.16 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <8 x i8>*		%0 = bitcast i8* %z to <8 x i8>*
%1 = load <8 x i8>, <8 x i8>* %0, align 1		%1 = load <8 x i8>, <8 x i8>* %0, align 1
%2 = sext <8 x i8> %1 to <8 x i16>		%2 = sext <8 x i8> %1 to <8 x i16>
%3 = bitcast i8* %y to <8 x i16>*		%3 = bitcast i8* %y to <8 x i16>*
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %y to <8 x i16>*		%2 = bitcast i8* %y to <8 x i16>*
store <8 x i16> %1, <8 x i16>* %2, align 2		store <8 x i16> %1, <8 x i16>* %2, align 2
ret i8* %z		ret i8* %z
}		}

define i8* @ldrhi32_align1(i8* %x, i8* %y) {		define i8* @ldrhi32_align1(i8* %x, i8* %y) {
; CHECK-LABEL: ldrhi32_align1:		; CHECK-LABEL: ldrhi32_align1:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: .pad #8
; CHECK-NEXT: vldrh.s32 q0, [r0]		; CHECK-NEXT: sub sp, #8
		; CHECK-NEXT: ldr r2, [r0, #3]!
		; CHECK-NEXT: str r2, [sp]
		; CHECK-NEXT: ldr r2, [r0, #4]
		; CHECK-NEXT: str r2, [sp, #4]
		; CHECK-NEXT: mov r2, sp
		; CHECK-NEXT: vldrh.s32 q0, [r2]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %x, i32 3		%z = getelementptr inbounds i8, i8* %x, i32 3
%0 = bitcast i8* %z to <4 x i16>*		%0 = bitcast i8* %z to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 1		%1 = load <4 x i16>, <4 x i16>* %0, align 1
%2 = bitcast i8* %y to <4 x i32>*		%2 = bitcast i8* %y to <4 x i32>*
%3 = sext <4 x i16> %1 to <4 x i32>		%3 = sext <4 x i16> %1 to <4 x i32>
store <4 x i32> %3, <4 x i32>* %2, align 4		store <4 x i32> %3, <4 x i32>* %2, align 4
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %z to <4 x i16>*		%2 = bitcast i8* %z to <4 x i16>*
store <4 x i16> %1, <4 x i16>* %2, align 2		store <4 x i16> %1, <4 x i16>* %2, align 2
ret i8* %z		ret i8* %z
}		}

define i8* @strh32_2(i8* %y, i8* %x) {		define i8* @strh32_2(i8* %y, i8* %x) {
; CHECK-LABEL: strh32_2:		; CHECK-LABEL: strh32_2:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r0, #2
; CHECK-NEXT: vldrh.u32 q0, [r1]		; CHECK-NEXT: vldrh.u32 q0, [r1]
; CHECK-NEXT: vstrh.32 q0, [r0]		; CHECK-NEXT: vstrh.32 q0, [r0, #2]
		; CHECK-NEXT: adds r0, #2
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 2		%z = getelementptr inbounds i8, i8* %y, i32 2
%0 = bitcast i8* %x to <4 x i16>*		%0 = bitcast i8* %x to <4 x i16>*
%1 = load <4 x i16>, <4 x i16>* %0, align 2		%1 = load <4 x i16>, <4 x i16>* %0, align 2
%2 = bitcast i8* %z to <4 x i16>*		%2 = bitcast i8* %z to <4 x i16>*
store <4 x i16> %1, <4 x i16>* %2, align 2		store <4 x i16> %1, <4 x i16>* %2, align 2
ret i8* %z		ret i8* %z
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %z to <4 x i8>*		%2 = bitcast i8* %z to <4 x i8>*
store <4 x i8> %1, <4 x i8>* %2, align 1		store <4 x i8> %1, <4 x i8>* %2, align 1
ret i8* %z		ret i8* %z
}		}

define i8* @strb32_3(i8* %y, i8* %x) {		define i8* @strb32_3(i8* %y, i8* %x) {
; CHECK-LABEL: strb32_3:		; CHECK-LABEL: strb32_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: vldrb.u32 q0, [r1]		; CHECK-NEXT: vldrb.u32 q0, [r1]
; CHECK-NEXT: vstrb.32 q0, [r0]		; CHECK-NEXT: vstrb.32 q0, [r0, #3]
		; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <4 x i8>*		%0 = bitcast i8* %x to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 1		%1 = load <4 x i8>, <4 x i8>* %0, align 1
%2 = bitcast i8* %z to <4 x i8>*		%2 = bitcast i8* %z to <4 x i8>*
store <4 x i8> %1, <4 x i8>* %2, align 1		store <4 x i8> %1, <4 x i8>* %2, align 1
ret i8* %z		ret i8* %z
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %z to <8 x i8>*		%2 = bitcast i8* %z to <8 x i8>*
store <8 x i8> %1, <8 x i8>* %2, align 1		store <8 x i8> %1, <8 x i8>* %2, align 1
ret i8* %z		ret i8* %z
}		}

define i8* @strb16_3(i8* %y, i8* %x) {		define i8* @strb16_3(i8* %y, i8* %x) {
; CHECK-LABEL: strb16_3:		; CHECK-LABEL: strb16_3:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: vldrb.u16 q0, [r1]		; CHECK-NEXT: vldrb.u16 q0, [r1]
; CHECK-NEXT: vstrb.16 q0, [r0]		; CHECK-NEXT: vstrb.16 q0, [r0, #3]
		; CHECK-NEXT: adds r0, #3
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <8 x i8>*		%0 = bitcast i8* %x to <8 x i8>*
%1 = load <8 x i8>, <8 x i8>* %0, align 1		%1 = load <8 x i8>, <8 x i8>* %0, align 1
%2 = bitcast i8* %z to <8 x i8>*		%2 = bitcast i8* %z to <8 x i8>*
store <8 x i8> %1, <8 x i8>* %2, align 1		store <8 x i8> %1, <8 x i8>* %2, align 1
ret i8* %z		ret i8* %z
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	entry:
%2 = bitcast i8* %z to <8 x i16>*		%2 = bitcast i8* %z to <8 x i16>*
store <8 x i16> %1, <8 x i16>* %2, align 1		store <8 x i16> %1, <8 x i16>* %2, align 1
ret i8* %z		ret i8* %z
}		}

define i8* @strhi32_align1(i8* %y, i8* %x) {		define i8* @strhi32_align1(i8* %y, i8* %x) {
; CHECK-LABEL: strhi32_align1:		; CHECK-LABEL: strhi32_align1:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: adds r0, #3		; CHECK-NEXT: .pad #8
		; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: vldrw.u32 q0, [r1]		; CHECK-NEXT: vldrw.u32 q0, [r1]
; CHECK-NEXT: vstrh.32 q0, [r0]		; CHECK-NEXT: mov r1, sp
		; CHECK-NEXT: vstrh.32 q0, [r1]
		; CHECK-NEXT: ldrd r1, r2, [sp]
		; CHECK-NEXT: str r1, [r0, #3]!
		; CHECK-NEXT: str r2, [r0, #4]
		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%z = getelementptr inbounds i8, i8* %y, i32 3		%z = getelementptr inbounds i8, i8* %y, i32 3
%0 = bitcast i8* %x to <4 x i32>*		%0 = bitcast i8* %x to <4 x i32>*
%1 = load <4 x i32>, <4 x i32>* %0, align 4		%1 = load <4 x i32>, <4 x i32>* %0, align 4
%2 = bitcast i8* %z to <4 x i16>*		%2 = bitcast i8* %z to <4 x i16>*
%3 = trunc <4 x i32> %1 to <4 x i16>		%3 = trunc <4 x i32> %1 to <4 x i16>
store <4 x i16> %3, <4 x i16>* %2, align 1		store <4 x i16> %3, <4 x i16>* %2, align 1
Show All 34 Lines

llvm/trunk/test/CodeGen/Thumb2/mve-widen-narrow.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s

	define void @foo_int8_int32(<4 x i8>* %dest, <4 x i32>* readonly %src, i32 %n) {			define void @foo_int8_int32(<4 x i8>* %dest, <4 x i32>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_int8_int32:			; CHECK-LABEL: foo_int8_int32:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrw.u32 q0, [r1]			; CHECK-NEXT: vldrw.u32 q0, [r1]
	; CHECK-NEXT: vstrb.32 q0, [r0]			; CHECK-NEXT: vstrb.32 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <4 x i32>, <4 x i32>* %src, align 4			%wide.load = load <4 x i32>, <4 x i32>* %src, align 4
	%0 = trunc <4 x i32> %wide.load to <4 x i8>			%0 = trunc <4 x i32> %wide.load to <4 x i8>
	store <4 x i8> %0, <4 x i8>* %dest, align 1			store <4 x i8> %0, <4 x i8>* %dest, align 1
	ret void			ret void
	}			}


	define void @foo_int16_int32(<4 x i16>* %dest, <4 x i32>* readonly %src, i32 %n) {			define void @foo_int16_int32(<4 x i16>* %dest, <4 x i32>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_int16_int32:			; CHECK-LABEL: foo_int16_int32:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrw.u32 q0, [r1]			; CHECK-NEXT: vldrw.u32 q0, [r1]
	; CHECK-NEXT: vstrh.32 q0, [r0]			; CHECK-NEXT: vstrh.32 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <4 x i32>, <4 x i32>* %src, align 4			%wide.load = load <4 x i32>, <4 x i32>* %src, align 4
	%0 = trunc <4 x i32> %wide.load to <4 x i16>			%0 = trunc <4 x i32> %wide.load to <4 x i16>
	store <4 x i16> %0, <4 x i16>* %dest, align 2			store <4 x i16> %0, <4 x i16>* %dest, align 2
	ret void			ret void
	}			}


	define void @foo_int8_int16(<8 x i8>* %dest, <8 x i16>* readonly %src, i32 %n) {			define void @foo_int8_int16(<8 x i8>* %dest, <8 x i16>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_int8_int16:			; CHECK-LABEL: foo_int8_int16:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrh.u16 q0, [r1]			; CHECK-NEXT: vldrh.u16 q0, [r1]
	; CHECK-NEXT: vstrb.16 q0, [r0]			; CHECK-NEXT: vstrb.16 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <8 x i16>, <8 x i16>* %src, align 2			%wide.load = load <8 x i16>, <8 x i16>* %src, align 2
	%0 = trunc <8 x i16> %wide.load to <8 x i8>			%0 = trunc <8 x i16> %wide.load to <8 x i8>
	store <8 x i8> %0, <8 x i8>* %dest, align 1			store <8 x i8> %0, <8 x i8>* %dest, align 1
	ret void			ret void
	}			}


	define void @foo_int32_int8(<4 x i32>* %dest, <4 x i8>* readonly %src, i32 %n) {			define void @foo_int32_int8(<4 x i32>* %dest, <4 x i8>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_int32_int8:			; CHECK-LABEL: foo_int32_int8:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrb.s32 q0, [r1]			; CHECK-NEXT: vldrb.s32 q0, [r1]
	; CHECK-NEXT: vstrw.32 q0, [r0]			; CHECK-NEXT: vstrw.32 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <4 x i8>, <4 x i8>* %src, align 1			%wide.load = load <4 x i8>, <4 x i8>* %src, align 1
	%0 = sext <4 x i8> %wide.load to <4 x i32>			%0 = sext <4 x i8> %wide.load to <4 x i32>
	store <4 x i32> %0, <4 x i32>* %dest, align 4			store <4 x i32> %0, <4 x i32>* %dest, align 4
	ret void			ret void
	}			}


	define void @foo_int16_int8(<8 x i16>* %dest, <8 x i8>* readonly %src, i32 %n) {			define void @foo_int16_int8(<8 x i16>* %dest, <8 x i8>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_int16_int8:			; CHECK-LABEL: foo_int16_int8:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrb.s16 q0, [r1]			; CHECK-NEXT: vldrb.s16 q0, [r1]
	; CHECK-NEXT: vstrh.16 q0, [r0]			; CHECK-NEXT: vstrh.16 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <8 x i8>, <8 x i8>* %src, align 1			%wide.load = load <8 x i8>, <8 x i8>* %src, align 1
	%0 = sext <8 x i8> %wide.load to <8 x i16>			%0 = sext <8 x i8> %wide.load to <8 x i16>
	store <8 x i16> %0, <8 x i16>* %dest, align 2			store <8 x i16> %0, <8 x i16>* %dest, align 2
	ret void			ret void
	}			}


	define void @foo_int32_int16(<4 x i32>* %dest, <4 x i16>* readonly %src, i32 %n) {			define void @foo_int32_int16(<4 x i32>* %dest, <4 x i16>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_int32_int16:			; CHECK-LABEL: foo_int32_int16:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrh.s32 q0, [r1]			; CHECK-NEXT: vldrh.s32 q0, [r1]
	; CHECK-NEXT: vstrw.32 q0, [r0]			; CHECK-NEXT: vstrw.32 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <4 x i16>, <4 x i16>* %src, align 2			%wide.load = load <4 x i16>, <4 x i16>* %src, align 2
	%0 = sext <4 x i16> %wide.load to <4 x i32>			%0 = sext <4 x i16> %wide.load to <4 x i32>
	store <4 x i32> %0, <4 x i32>* %dest, align 4			store <4 x i32> %0, <4 x i32>* %dest, align 4
	ret void			ret void
	}			}


	define void @foo_uint32_uint8(<4 x i32>* %dest, <4 x i8>* readonly %src, i32 %n) {			define void @foo_uint32_uint8(<4 x i32>* %dest, <4 x i8>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_uint32_uint8:			; CHECK-LABEL: foo_uint32_uint8:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrb.u32 q0, [r1]			; CHECK-NEXT: vldrb.u32 q0, [r1]
	; CHECK-NEXT: vstrw.32 q0, [r0]			; CHECK-NEXT: vstrw.32 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <4 x i8>, <4 x i8>* %src, align 1			%wide.load = load <4 x i8>, <4 x i8>* %src, align 1
	%0 = zext <4 x i8> %wide.load to <4 x i32>			%0 = zext <4 x i8> %wide.load to <4 x i32>
	store <4 x i32> %0, <4 x i32>* %dest, align 4			store <4 x i32> %0, <4 x i32>* %dest, align 4
	ret void			ret void
	}			}


	define void @foo_uint16_uint8(<8 x i16>* %dest, <8 x i8>* readonly %src, i32 %n) {			define void @foo_uint16_uint8(<8 x i16>* %dest, <8 x i8>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_uint16_uint8:			; CHECK-LABEL: foo_uint16_uint8:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrb.u16 q0, [r1]			; CHECK-NEXT: vldrb.u16 q0, [r1]
	; CHECK-NEXT: vstrh.16 q0, [r0]			; CHECK-NEXT: vstrh.16 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <8 x i8>, <8 x i8>* %src, align 1			%wide.load = load <8 x i8>, <8 x i8>* %src, align 1
	%0 = zext <8 x i8> %wide.load to <8 x i16>			%0 = zext <8 x i8> %wide.load to <8 x i16>
	store <8 x i16> %0, <8 x i16>* %dest, align 2			store <8 x i16> %0, <8 x i16>* %dest, align 2
	ret void			ret void
	}			}


	define void @foo_uint32_uint16(<4 x i32>* %dest, <4 x i16>* readonly %src, i32 %n) {			define void @foo_uint32_uint16(<4 x i32>* %dest, <4 x i16>* readonly %src, i32 %n) {
	; CHECK-LABEL: foo_uint32_uint16:			; CHECK-LABEL: foo_uint32_uint16:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vldrh.u32 q0, [r1]			; CHECK-NEXT: vldrh.u32 q0, [r1]
	; CHECK-NEXT: vstrw.32 q0, [r0]			; CHECK-NEXT: vstrw.32 q0, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%wide.load = load <4 x i16>, <4 x i16>* %src, align 2			%wide.load = load <4 x i16>, <4 x i16>* %src, align 2
	%0 = zext <4 x i16> %wide.load to <4 x i32>			%0 = zext <4 x i16> %wide.load to <4 x i32>
	store <4 x i32> %0, <4 x i32>* %dest, align 4			store <4 x i32> %0, <4 x i32>* %dest, align 4
	ret void			ret void
	}			}




				define void @foo_int16_int32_align1(<4 x i16>* %dest, <4 x i32>* readonly %src, i32 %n) {
				; CHECK-LABEL: foo_int16_int32_align1:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #8
				; CHECK-NEXT: sub sp, #8
				; CHECK-NEXT: vldrw.u32 q0, [r1]
				; CHECK-NEXT: mov r1, sp
				; CHECK-NEXT: vstrh.32 q0, [r1]
				; CHECK-NEXT: ldrd r1, r2, [sp]
				; CHECK-NEXT: str r1, [r0]
				; CHECK-NEXT: str r2, [r0, #4]
				; CHECK-NEXT: add sp, #8
				; CHECK-NEXT: bx lr
				entry:
				%wide.load = load <4 x i32>, <4 x i32>* %src, align 4
				%0 = trunc <4 x i32> %wide.load to <4 x i16>
				store <4 x i16> %0, <4 x i16>* %dest, align 1
				ret void
				}

				define void @foo_int32_int16_align1(<4 x i32>* %dest, <4 x i16>* readonly %src, i32 %n) {
				; CHECK-LABEL: foo_int32_int16_align1:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #8
				; CHECK-NEXT: sub sp, #8
				; CHECK-NEXT: ldr r2, [r1]
				; CHECK-NEXT: ldr r1, [r1, #4]
				; CHECK-NEXT: strd r2, r1, [sp]
				; CHECK-NEXT: mov r1, sp
				; CHECK-NEXT: vldrh.s32 q0, [r1]
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: add sp, #8
				; CHECK-NEXT: bx lr
				entry:
				%wide.load = load <4 x i16>, <4 x i16>* %src, align 1
				%0 = sext <4 x i16> %wide.load to <4 x i32>
				store <4 x i32> %0, <4 x i32>* %dest, align 4
				ret void
				}

				define void @foo_uint32_uint16_align1(<4 x i32>* %dest, <4 x i16>* readonly %src, i32 %n) {
				; CHECK-LABEL: foo_uint32_uint16_align1:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #8
				; CHECK-NEXT: sub sp, #8
				; CHECK-NEXT: ldr r2, [r1]
				; CHECK-NEXT: ldr r1, [r1, #4]
				; CHECK-NEXT: strd r2, r1, [sp]
				; CHECK-NEXT: mov r1, sp
				; CHECK-NEXT: vldrh.u32 q0, [r1]
				; CHECK-NEXT: vstrw.32 q0, [r0]
				; CHECK-NEXT: add sp, #8
				; CHECK-NEXT: bx lr
				entry:
				%wide.load = load <4 x i16>, <4 x i16>* %src, align 1
				%0 = zext <4 x i16> %wide.load to <4 x i32>
				store <4 x i32> %0, <4 x i32>* %dest, align 4
				ret void
				}