This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB
ClosedPublic

Authored by thopre on Feb 6 2018, 8:15 AM.

Download Raw Diff

Details

Reviewers

rengolin
t.p.northover
samparker

Commits

rG9deef20b6c4f: [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB
rL326570: [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB

Summary

Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is
broken due to 2 separate bugs:

VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD.

Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector.

However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback.

This patch addresses the two issues as follows:

it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access.
it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2.

Diff Detail

Repository: rL LLVM

Event Timeline

thopre created this revision.Feb 6 2018, 8:15 AM

Herald added subscribers: llvm-commits, kristof.beyls, javed.absar, aemerson. · View Herald TranscriptFeb 6 2018, 8:15 AM

srhines added a subscriber: srhines.Feb 6 2018, 5:52 PM

samparker added a subscriber: samparker.Feb 7 2018, 1:56 AM

samparker added inline comments.

test/CodeGen/ARM/pr35157.ll
7 ↗	(On Diff #133014)	I can see that the bug originally caused a fault, but we should also be testing what code is generated.

Extend existing vld3.ll/vld4.ll/vst3.ll/vst4.ll testcases instead of creating a new one, following similar patterns as those already in there incl. check directives.

thopre marked an inline comment as done.Feb 8 2018, 7:04 AM

https://reviews.llvm.org/D42967 is blocking this, but it would be great to get this into our next toolchain update.

efriedma added a subscriber: efriedma.Feb 22 2018, 11:00 AM

Thanks for the changes, LGTM.

This revision is now accepted and ready to land.Feb 23 2018, 12:26 AM

Merge patch with the one in https://reviews.llvm.org/D42967 and remove redundant mir test.

LGTM, thanks.

https://reviews.llvm.org/D42967 has been merged into this patch. Can I get a review of the new diff? Thanks!

Is this waiting on anything else? It looks like the approval happened at the same time as your last message.

In D42970#1024383, @srhines wrote:

Is this waiting on anything else? It looks like the approval happened at the same time as your last message.

Oh indeed, I thought the previous revision had been approved. I'll go ask for commit access and commit it asap.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 2 2018, 5:05 AM

Closed by commit rL326570: [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

LGTM. Thanks for fixing this and thanks to samparker for reviewing and fhahn submitting it.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMBaseInstrInfo.cpp

2 lines

ARMExpandPseudoInsts.cpp

4 lines

ARMISelDAGToDAG.cpp

35 lines

test/

CodeGen/

ARM/

13 lines

13 lines

12 lines

12 lines

Diff 136720

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp

Show First 20 Lines • Show All 4,208 Lines • ▼ Show 20 Lines	if (DefAlign < 8 && Subtarget.checkVLDnAccessAlignment())
case ARM::VLD2q8PseudoWB_register:		case ARM::VLD2q8PseudoWB_register:
case ARM::VLD2q16PseudoWB_register:		case ARM::VLD2q16PseudoWB_register:
case ARM::VLD2q32PseudoWB_register:		case ARM::VLD2q32PseudoWB_register:
case ARM::VLD3d8Pseudo:		case ARM::VLD3d8Pseudo:
case ARM::VLD3d16Pseudo:		case ARM::VLD3d16Pseudo:
case ARM::VLD3d32Pseudo:		case ARM::VLD3d32Pseudo:
case ARM::VLD1d64TPseudo:		case ARM::VLD1d64TPseudo:
case ARM::VLD1d64TPseudoWB_fixed:		case ARM::VLD1d64TPseudoWB_fixed:
		case ARM::VLD1d64TPseudoWB_register:
case ARM::VLD3d8Pseudo_UPD:		case ARM::VLD3d8Pseudo_UPD:
case ARM::VLD3d16Pseudo_UPD:		case ARM::VLD3d16Pseudo_UPD:
case ARM::VLD3d32Pseudo_UPD:		case ARM::VLD3d32Pseudo_UPD:
case ARM::VLD3q8Pseudo_UPD:		case ARM::VLD3q8Pseudo_UPD:
case ARM::VLD3q16Pseudo_UPD:		case ARM::VLD3q16Pseudo_UPD:
case ARM::VLD3q32Pseudo_UPD:		case ARM::VLD3q32Pseudo_UPD:
case ARM::VLD3q8oddPseudo:		case ARM::VLD3q8oddPseudo:
case ARM::VLD3q16oddPseudo:		case ARM::VLD3q16oddPseudo:
case ARM::VLD3q32oddPseudo:		case ARM::VLD3q32oddPseudo:
case ARM::VLD3q8oddPseudo_UPD:		case ARM::VLD3q8oddPseudo_UPD:
case ARM::VLD3q16oddPseudo_UPD:		case ARM::VLD3q16oddPseudo_UPD:
case ARM::VLD3q32oddPseudo_UPD:		case ARM::VLD3q32oddPseudo_UPD:
case ARM::VLD4d8Pseudo:		case ARM::VLD4d8Pseudo:
case ARM::VLD4d16Pseudo:		case ARM::VLD4d16Pseudo:
case ARM::VLD4d32Pseudo:		case ARM::VLD4d32Pseudo:
case ARM::VLD1d64QPseudo:		case ARM::VLD1d64QPseudo:
case ARM::VLD1d64QPseudoWB_fixed:		case ARM::VLD1d64QPseudoWB_fixed:
		case ARM::VLD1d64QPseudoWB_register:
case ARM::VLD4d8Pseudo_UPD:		case ARM::VLD4d8Pseudo_UPD:
case ARM::VLD4d16Pseudo_UPD:		case ARM::VLD4d16Pseudo_UPD:
case ARM::VLD4d32Pseudo_UPD:		case ARM::VLD4d32Pseudo_UPD:
case ARM::VLD4q8Pseudo_UPD:		case ARM::VLD4q8Pseudo_UPD:
case ARM::VLD4q16Pseudo_UPD:		case ARM::VLD4q16Pseudo_UPD:
case ARM::VLD4q32Pseudo_UPD:		case ARM::VLD4q32Pseudo_UPD:
case ARM::VLD4q8oddPseudo:		case ARM::VLD4q8oddPseudo:
case ARM::VLD4q16oddPseudo:		case ARM::VLD4q16oddPseudo:
▲ Show 20 Lines • Show All 755 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
{ ARM::VLD1LNq16Pseudo_UPD, ARM::VLD1LNd16_UPD, true, true, true, EvenDblSpc, 1, 4 ,true},		{ ARM::VLD1LNq16Pseudo_UPD, ARM::VLD1LNd16_UPD, true, true, true, EvenDblSpc, 1, 4 ,true},
{ ARM::VLD1LNq32Pseudo, ARM::VLD1LNd32, true, false, false, EvenDblSpc, 1, 2 ,true},		{ ARM::VLD1LNq32Pseudo, ARM::VLD1LNd32, true, false, false, EvenDblSpc, 1, 2 ,true},
{ ARM::VLD1LNq32Pseudo_UPD, ARM::VLD1LNd32_UPD, true, true, true, EvenDblSpc, 1, 2 ,true},		{ ARM::VLD1LNq32Pseudo_UPD, ARM::VLD1LNd32_UPD, true, true, true, EvenDblSpc, 1, 2 ,true},
{ ARM::VLD1LNq8Pseudo, ARM::VLD1LNd8, true, false, false, EvenDblSpc, 1, 8 ,true},		{ ARM::VLD1LNq8Pseudo, ARM::VLD1LNd8, true, false, false, EvenDblSpc, 1, 8 ,true},
{ ARM::VLD1LNq8Pseudo_UPD, ARM::VLD1LNd8_UPD, true, true, true, EvenDblSpc, 1, 8 ,true},		{ ARM::VLD1LNq8Pseudo_UPD, ARM::VLD1LNd8_UPD, true, true, true, EvenDblSpc, 1, 8 ,true},

{ ARM::VLD1d64QPseudo, ARM::VLD1d64Q, true, false, false, SingleSpc, 4, 1 ,false},		{ ARM::VLD1d64QPseudo, ARM::VLD1d64Q, true, false, false, SingleSpc, 4, 1 ,false},
{ ARM::VLD1d64QPseudoWB_fixed, ARM::VLD1d64Qwb_fixed, true, true, false, SingleSpc, 4, 1 ,false},		{ ARM::VLD1d64QPseudoWB_fixed, ARM::VLD1d64Qwb_fixed, true, true, false, SingleSpc, 4, 1 ,false},
		{ ARM::VLD1d64QPseudoWB_register, ARM::VLD1d64Qwb_register, true, true, true, SingleSpc, 4, 1 ,false},
{ ARM::VLD1d64TPseudo, ARM::VLD1d64T, true, false, false, SingleSpc, 3, 1 ,false},		{ ARM::VLD1d64TPseudo, ARM::VLD1d64T, true, false, false, SingleSpc, 3, 1 ,false},
{ ARM::VLD1d64TPseudoWB_fixed, ARM::VLD1d64Twb_fixed, true, true, false, SingleSpc, 3, 1 ,false},		{ ARM::VLD1d64TPseudoWB_fixed, ARM::VLD1d64Twb_fixed, true, true, false, SingleSpc, 3, 1 ,false},
		{ ARM::VLD1d64TPseudoWB_register, ARM::VLD1d64Twb_register, true, true, true, SingleSpc, 3, 1 ,false},

{ ARM::VLD2LNd16Pseudo, ARM::VLD2LNd16, true, false, false, SingleSpc, 2, 4 ,true},		{ ARM::VLD2LNd16Pseudo, ARM::VLD2LNd16, true, false, false, SingleSpc, 2, 4 ,true},
{ ARM::VLD2LNd16Pseudo_UPD, ARM::VLD2LNd16_UPD, true, true, true, SingleSpc, 2, 4 ,true},		{ ARM::VLD2LNd16Pseudo_UPD, ARM::VLD2LNd16_UPD, true, true, true, SingleSpc, 2, 4 ,true},
{ ARM::VLD2LNd32Pseudo, ARM::VLD2LNd32, true, false, false, SingleSpc, 2, 2 ,true},		{ ARM::VLD2LNd32Pseudo, ARM::VLD2LNd32, true, false, false, SingleSpc, 2, 2 ,true},
{ ARM::VLD2LNd32Pseudo_UPD, ARM::VLD2LNd32_UPD, true, true, true, SingleSpc, 2, 2 ,true},		{ ARM::VLD2LNd32Pseudo_UPD, ARM::VLD2LNd32_UPD, true, true, true, SingleSpc, 2, 2 ,true},
{ ARM::VLD2LNd8Pseudo, ARM::VLD2LNd8, true, false, false, SingleSpc, 2, 8 ,true},		{ ARM::VLD2LNd8Pseudo, ARM::VLD2LNd8, true, false, false, SingleSpc, 2, 8 ,true},
{ ARM::VLD2LNd8Pseudo_UPD, ARM::VLD2LNd8_UPD, true, true, true, SingleSpc, 2, 8 ,true},		{ ARM::VLD2LNd8Pseudo_UPD, ARM::VLD2LNd8_UPD, true, true, true, SingleSpc, 2, 8 ,true},
{ ARM::VLD2LNq16Pseudo, ARM::VLD2LNq16, true, false, false, EvenDblSpc, 2, 4 ,true},		{ ARM::VLD2LNq16Pseudo, ARM::VLD2LNq16, true, false, false, EvenDblSpc, 2, 4 ,true},
▲ Show 20 Lines • Show All 1,329 Lines • ▼ Show 20 Lines	switch (Opcode) {
case ARM::VLD2q8PseudoWB_register:		case ARM::VLD2q8PseudoWB_register:
case ARM::VLD2q16PseudoWB_register:		case ARM::VLD2q16PseudoWB_register:
case ARM::VLD2q32PseudoWB_register:		case ARM::VLD2q32PseudoWB_register:
case ARM::VLD3d8Pseudo:		case ARM::VLD3d8Pseudo:
case ARM::VLD3d16Pseudo:		case ARM::VLD3d16Pseudo:
case ARM::VLD3d32Pseudo:		case ARM::VLD3d32Pseudo:
case ARM::VLD1d64TPseudo:		case ARM::VLD1d64TPseudo:
case ARM::VLD1d64TPseudoWB_fixed:		case ARM::VLD1d64TPseudoWB_fixed:
		case ARM::VLD1d64TPseudoWB_register:
case ARM::VLD3d8Pseudo_UPD:		case ARM::VLD3d8Pseudo_UPD:
case ARM::VLD3d16Pseudo_UPD:		case ARM::VLD3d16Pseudo_UPD:
case ARM::VLD3d32Pseudo_UPD:		case ARM::VLD3d32Pseudo_UPD:
case ARM::VLD3q8Pseudo_UPD:		case ARM::VLD3q8Pseudo_UPD:
case ARM::VLD3q16Pseudo_UPD:		case ARM::VLD3q16Pseudo_UPD:
case ARM::VLD3q32Pseudo_UPD:		case ARM::VLD3q32Pseudo_UPD:
case ARM::VLD3q8oddPseudo:		case ARM::VLD3q8oddPseudo:
case ARM::VLD3q16oddPseudo:		case ARM::VLD3q16oddPseudo:
case ARM::VLD3q32oddPseudo:		case ARM::VLD3q32oddPseudo:
case ARM::VLD3q8oddPseudo_UPD:		case ARM::VLD3q8oddPseudo_UPD:
case ARM::VLD3q16oddPseudo_UPD:		case ARM::VLD3q16oddPseudo_UPD:
case ARM::VLD3q32oddPseudo_UPD:		case ARM::VLD3q32oddPseudo_UPD:
case ARM::VLD4d8Pseudo:		case ARM::VLD4d8Pseudo:
case ARM::VLD4d16Pseudo:		case ARM::VLD4d16Pseudo:
case ARM::VLD4d32Pseudo:		case ARM::VLD4d32Pseudo:
case ARM::VLD1d64QPseudo:		case ARM::VLD1d64QPseudo:
case ARM::VLD1d64QPseudoWB_fixed:		case ARM::VLD1d64QPseudoWB_fixed:
		case ARM::VLD1d64QPseudoWB_register:
case ARM::VLD4d8Pseudo_UPD:		case ARM::VLD4d8Pseudo_UPD:
case ARM::VLD4d16Pseudo_UPD:		case ARM::VLD4d16Pseudo_UPD:
case ARM::VLD4d32Pseudo_UPD:		case ARM::VLD4d32Pseudo_UPD:
case ARM::VLD4q8Pseudo_UPD:		case ARM::VLD4q8Pseudo_UPD:
case ARM::VLD4q16Pseudo_UPD:		case ARM::VLD4q16Pseudo_UPD:
case ARM::VLD4q32Pseudo_UPD:		case ARM::VLD4q32Pseudo_UPD:
case ARM::VLD4q8oddPseudo:		case ARM::VLD4q8oddPseudo:
case ARM::VLD4q16oddPseudo:		case ARM::VLD4q16oddPseudo:
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,788 Lines • ▼ Show 20 Lines	void ARMDAGToDAGISel::SelectVLD(SDNode *N, bool isUpdating, unsigned NumVecs,
// Double registers and VLD1/VLD2 quad registers are directly supported.		// Double registers and VLD1/VLD2 quad registers are directly supported.
if (is64BitVector \|\| NumVecs <= 2) {		if (is64BitVector \|\| NumVecs <= 2) {
unsigned Opc = (is64BitVector ? DOpcodes[OpcodeIndex] :		unsigned Opc = (is64BitVector ? DOpcodes[OpcodeIndex] :
QOpcodes0[OpcodeIndex]);		QOpcodes0[OpcodeIndex]);
Ops.push_back(MemAddr);		Ops.push_back(MemAddr);
Ops.push_back(Align);		Ops.push_back(Align);
if (isUpdating) {		if (isUpdating) {
SDValue Inc = N->getOperand(AddrOpIdx + 1);		SDValue Inc = N->getOperand(AddrOpIdx + 1);
// FIXME: VLD1/VLD2 fixed increment doesn't need Reg0. Remove the reg0
// case entirely when the rest are updated to that form, too.
bool IsImmUpdate = isPerfectIncrement(Inc, VT, NumVecs);		bool IsImmUpdate = isPerfectIncrement(Inc, VT, NumVecs);
if ((NumVecs <= 2) && !IsImmUpdate)		if (!IsImmUpdate) {
		// We use a VLD1 for v1i64 even if the pseudo says vld2/3/4, so
		// check for the opcode rather than the number of vector elements.
		if (isVLDfixed(Opc))
Opc = getVLDSTRegisterUpdateOpcode(Opc);		Opc = getVLDSTRegisterUpdateOpcode(Opc);
// FIXME: We use a VLD1 for v1i64 even if the pseudo says vld2/3/4, so		Ops.push_back(Inc);
// check for that explicitly too. Horribly hacky, but temporary.		// VLD1/VLD2 fixed increment does not need Reg0 so only include it in
if ((NumVecs > 2 && !isVLDfixed(Opc)) \|\| !IsImmUpdate)		// the operands if not such an opcode.
Ops.push_back(IsImmUpdate ? Reg0 : Inc);		} else if (!isVLDfixed(Opc))
		Ops.push_back(Reg0);
}		}
Ops.push_back(Pred);		Ops.push_back(Pred);
Ops.push_back(Reg0);		Ops.push_back(Reg0);
Ops.push_back(Chain);		Ops.push_back(Chain);
VLd = CurDAG->getMachineNode(Opc, dl, ResTys, Ops);		VLd = CurDAG->getMachineNode(Opc, dl, ResTys, Ops);

} else {		} else {
// Otherwise, quad registers are loaded with two separate instructions,		// Otherwise, quad registers are loaded with two separate instructions,
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	if (is64BitVector \|\| NumVecs <= 2) {
}		}

unsigned Opc = (is64BitVector ? DOpcodes[OpcodeIndex] :		unsigned Opc = (is64BitVector ? DOpcodes[OpcodeIndex] :
QOpcodes0[OpcodeIndex]);		QOpcodes0[OpcodeIndex]);
Ops.push_back(MemAddr);		Ops.push_back(MemAddr);
Ops.push_back(Align);		Ops.push_back(Align);
if (isUpdating) {		if (isUpdating) {
SDValue Inc = N->getOperand(AddrOpIdx + 1);		SDValue Inc = N->getOperand(AddrOpIdx + 1);
// FIXME: VST1/VST2 fixed increment doesn't need Reg0. Remove the reg0
// case entirely when the rest are updated to that form, too.
bool IsImmUpdate = isPerfectIncrement(Inc, VT, NumVecs);		bool IsImmUpdate = isPerfectIncrement(Inc, VT, NumVecs);
if (NumVecs <= 2 && !IsImmUpdate)		if (!IsImmUpdate) {
		// We use a VST1 for v1i64 even if the pseudo says VST2/3/4, so
		// check for the opcode rather than the number of vector elements.
		if (isVSTfixed(Opc))
Opc = getVLDSTRegisterUpdateOpcode(Opc);		Opc = getVLDSTRegisterUpdateOpcode(Opc);
// FIXME: We use a VST1 for v1i64 even if the pseudo says vld2/3/4, so
// check for that explicitly too. Horribly hacky, but temporary.
if (!IsImmUpdate)
Ops.push_back(Inc);		Ops.push_back(Inc);
else if (NumVecs > 2 && !isVSTfixed(Opc))		}
		// VST1/VST2 fixed increment does not need Reg0 so only include it in
		// the operands if not such an opcode.
		else if (!isVSTfixed(Opc))
Ops.push_back(Reg0);		Ops.push_back(Reg0);
}		}
Ops.push_back(SrcReg);		Ops.push_back(SrcReg);
Ops.push_back(Pred);		Ops.push_back(Pred);
Ops.push_back(Reg0);		Ops.push_back(Reg0);
Ops.push_back(Chain);		Ops.push_back(Chain);
SDNode *VSt = CurDAG->getMachineNode(Opc, dl, ResTys, Ops);		SDNode *VSt = CurDAG->getMachineNode(Opc, dl, ResTys, Ops);

▲ Show 20 Lines • Show All 2,169 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/vld3.ll

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	;CHECK: vld1.64 {d16, d17, d18}, [{{r[0-9]+\|lr}}:64]!
%tmp5 = getelementptr i64, i64* %A, i32 3		%tmp5 = getelementptr i64, i64* %A, i32 3
store i64* %tmp5, i64** %ptr		store i64* %tmp5, i64** %ptr
%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0		%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0
%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2		%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2
%tmp4 = add <1 x i64> %tmp2, %tmp3		%tmp4 = add <1 x i64> %tmp2, %tmp3
ret <1 x i64> %tmp4		ret <1 x i64> %tmp4
}		}

		define <1 x i64> @vld3i64_reg_update(i64** %ptr, i64* %A) nounwind {
		;CHECK-LABEL: vld3i64_reg_update:
		;CHECK: vld1.64 {d16, d17, d18}, [{{r[0-9]+\|lr}}:64], {{r[0-9]+\|lr}}
		%tmp0 = bitcast i64* %A to i8*
		%tmp1 = call %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64.p0i8(i8* %tmp0, i32 16)
		%tmp5 = getelementptr i64, i64* %A, i32 1
		store i64* %tmp5, i64** %ptr
		%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0
		%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2
		%tmp4 = add <1 x i64> %tmp2, %tmp3
		ret <1 x i64> %tmp4
		}

define <16 x i8> @vld3Qi8(i8* %A) nounwind {		define <16 x i8> @vld3Qi8(i8* %A) nounwind {
;CHECK-LABEL: vld3Qi8:		;CHECK-LABEL: vld3Qi8:
;Check the alignment value. Max for this instruction is 64 bits:		;Check the alignment value. Max for this instruction is 64 bits:
;CHECK: vld3.8 {d16, d18, d20}, [{{r[0-9]+\|lr}}:64]!		;CHECK: vld3.8 {d16, d18, d20}, [{{r[0-9]+\|lr}}:64]!
;CHECK: vld3.8 {d17, d19, d21}, [{{r[0-9]+\|lr}}:64]		;CHECK: vld3.8 {d17, d19, d21}, [{{r[0-9]+\|lr}}:64]
%tmp1 = call %struct.__neon_int8x16x3_t @llvm.arm.neon.vld3.v16i8.p0i8(i8* %A, i32 32)		%tmp1 = call %struct.__neon_int8x16x3_t @llvm.arm.neon.vld3.v16i8.p0i8(i8* %A, i32 32)
%tmp2 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 0		%tmp2 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 0
%tmp3 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 2		%tmp3 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 2
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/vld4.ll

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	;CHECK: vld1.64 {d16, d17, d18, d19}, [{{r[0-9]+\|lr}}:256]!
%tmp5 = getelementptr i64, i64* %A, i32 4		%tmp5 = getelementptr i64, i64* %A, i32 4
store i64* %tmp5, i64** %ptr		store i64* %tmp5, i64** %ptr
%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0		%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0
%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2		%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2
%tmp4 = add <1 x i64> %tmp2, %tmp3		%tmp4 = add <1 x i64> %tmp2, %tmp3
ret <1 x i64> %tmp4		ret <1 x i64> %tmp4
}		}

		define <1 x i64> @vld4i64_reg_update(i64** %ptr, i64* %A) nounwind {
		;CHECK-LABEL: vld4i64_reg_update:
		;CHECK: vld1.64 {d16, d17, d18, d19}, [{{r[0-9]+\|lr}}:256], {{r[0-9]+\|lr}}
		%tmp0 = bitcast i64* %A to i8*
		%tmp1 = call %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64.p0i8(i8* %tmp0, i32 64)
		%tmp5 = getelementptr i64, i64* %A, i32 1
		store i64* %tmp5, i64** %ptr
		%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0
		%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2
		%tmp4 = add <1 x i64> %tmp2, %tmp3
		ret <1 x i64> %tmp4
		}

define <16 x i8> @vld4Qi8(i8* %A) nounwind {		define <16 x i8> @vld4Qi8(i8* %A) nounwind {
;CHECK-LABEL: vld4Qi8:		;CHECK-LABEL: vld4Qi8:
;Check the alignment value. Max for this instruction is 256 bits:		;Check the alignment value. Max for this instruction is 256 bits:
;CHECK: vld4.8 {d16, d18, d20, d22}, [{{r[0-9]+\|lr}}:256]!		;CHECK: vld4.8 {d16, d18, d20, d22}, [{{r[0-9]+\|lr}}:256]!
;CHECK: vld4.8 {d17, d19, d21, d23}, [{{r[0-9]+\|lr}}:256]		;CHECK: vld4.8 {d17, d19, d21, d23}, [{{r[0-9]+\|lr}}:256]
%tmp1 = call %struct.__neon_int8x16x4_t @llvm.arm.neon.vld4.v16i8.p0i8(i8* %A, i32 64)		%tmp1 = call %struct.__neon_int8x16x4_t @llvm.arm.neon.vld4.v16i8.p0i8(i8* %A, i32 64)
%tmp2 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 0		%tmp2 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 0
%tmp3 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 2		%tmp3 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 2
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/vst3.ll

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	;CHECK: vst1.64 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!
%tmp0 = bitcast i64* %A to i8*		%tmp0 = bitcast i64* %A to i8*
%tmp1 = load <1 x i64>, <1 x i64>* %B		%tmp1 = load <1 x i64>, <1 x i64>* %B
call void @llvm.arm.neon.vst3.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)		call void @llvm.arm.neon.vst3.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
%tmp2 = getelementptr i64, i64* %A, i32 3		%tmp2 = getelementptr i64, i64* %A, i32 3
store i64* %tmp2, i64** %ptr		store i64* %tmp2, i64** %ptr
ret void		ret void
}		}

		define void @vst3i64_reg_update(i64** %ptr, <1 x i64>* %B) nounwind {
		;CHECK-LABEL: vst3i64_reg_update
		;CHECK: vst1.64 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}], r{{.*}}
		%A = load i64, i64* %ptr
		%tmp0 = bitcast i64* %A to i8*
		%tmp1 = load <1 x i64>, <1 x i64>* %B
		call void @llvm.arm.neon.vst3.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
		%tmp2 = getelementptr i64, i64* %A, i32 1
		store i64* %tmp2, i64** %ptr
		ret void
		}

define void @vst3Qi8(i8* %A, <16 x i8>* %B) nounwind {		define void @vst3Qi8(i8* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: vst3Qi8:		;CHECK-LABEL: vst3Qi8:
;Check the alignment value. Max for this instruction is 64 bits:		;Check the alignment value. Max for this instruction is 64 bits:
;This test runs at -O0 so do not check for specific register numbers.		;This test runs at -O0 so do not check for specific register numbers.
;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]!		;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]!
;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]		;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]
%tmp1 = load <16 x i8>, <16 x i8>* %B		%tmp1 = load <16 x i8>, <16 x i8>* %B
call void @llvm.arm.neon.vst3.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 32)		call void @llvm.arm.neon.vst3.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 32)
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/vst4.ll

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	;CHECK: vst1.64 {d16, d17, d18, d19}, [r{{[0-9]+}}]!
%tmp0 = bitcast i64* %A to i8*		%tmp0 = bitcast i64* %A to i8*
%tmp1 = load <1 x i64>, <1 x i64>* %B		%tmp1 = load <1 x i64>, <1 x i64>* %B
call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)		call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
%tmp2 = getelementptr i64, i64* %A, i32 4		%tmp2 = getelementptr i64, i64* %A, i32 4
store i64* %tmp2, i64** %ptr		store i64* %tmp2, i64** %ptr
ret void		ret void
}		}

		define void @vst4i64_reg_update(i64** %ptr, <1 x i64>* %B) nounwind {
		;CHECK-LABEL: vst4i64_reg_update:
		;CHECK: vst1.64 {d16, d17, d18, d19}, [r{{[0-9]+}}], r{{[0-9]+}}
		%A = load i64, i64* %ptr
		%tmp0 = bitcast i64* %A to i8*
		%tmp1 = load <1 x i64>, <1 x i64>* %B
		call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
		%tmp2 = getelementptr i64, i64* %A, i32 1
		store i64* %tmp2, i64** %ptr
		ret void
		}

define void @vst4Qi8(i8* %A, <16 x i8>* %B) nounwind {		define void @vst4Qi8(i8* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: vst4Qi8:		;CHECK-LABEL: vst4Qi8:
;Check the alignment value. Max for this instruction is 256 bits:		;Check the alignment value. Max for this instruction is 256 bits:
;CHECK: vst4.8 {d16, d18, d20, d22}, [r0:256]!		;CHECK: vst4.8 {d16, d18, d20, d22}, [r0:256]!
;CHECK: vst4.8 {d17, d19, d21, d23}, [r0:256]		;CHECK: vst4.8 {d17, d19, d21, d23}, [r0:256]
%tmp1 = load <16 x i8>, <16 x i8>* %B		%tmp1 = load <16 x i8>, <16 x i8>* %B
call void @llvm.arm.neon.vst4.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 64)		call void @llvm.arm.neon.vst4.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 64)
ret void		ret void
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines