This is an archive of the discontinued LLVM Phabricator instance.

[NEON] Support vldNq intrinsics in AArch32 (LLVM part)
ClosedPublic

Authored by kosarev on Jun 21 2018, 8:58 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
jgreenhalgh
rengolin
javed.absar
dnsampaio

Commits

rG7231598fce4f: [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
rL335733: [NEON] Support vldNq intrinsics in AArch32 (LLVM part)

Summary

This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example.

Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes.

The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Diff Detail

Repository: rL LLVM

Event Timeline

kosarev created this revision.Jun 21 2018, 8:58 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 21 2018, 8:58 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

kosarev mentioned this in D48440: [NEON] Support vldNq intrinsics in AArch32 (Clang part).Jun 21 2018, 9:04 AM

kosarev added a child revision: D48440: [NEON] Support vldNq intrinsics in AArch32 (Clang part).

Ping.

SjoerdMeijer added inline comments.Jun 27 2018, 1:11 AM

test/CodeGen/ARM/arm-vlddup.ll
64 ↗	(On Diff #152313)	Looks like "llvm.arm.neon.vld4dup.v2i64.p0i8" is not tested?

kosarev added inline comments.Jun 27 2018, 1:34 AM

test/CodeGen/ARM/arm-vlddup.ll
64 ↗	(On Diff #152313)	The initial plan was to support q versions of the 64-bit intrinsics, but then it became clear that would require some special code. Will remove these declarations on commit.

Looks OK to me

This revision is now accepted and ready to land.Jun 27 2018, 1:40 AM

Closed by commit rL335733: [NEON] Support vldNq intrinsics in AArch32 (LLVM part) (authored by kosarev). · Explain WhyJun 27 2018, 7:02 AM

This revision was automatically updated to reflect the committed changes.

kosarev mentioned this in rL335734: [NEON] Support vldNq intrinsics in AArch32 (Clang part).

kosarev mentioned this in rC335734: [NEON] Support vldNq intrinsics in AArch32 (Clang part).

dnsampaio added a subscriber: dnsampaio.Jul 3 2018, 4:32 AM

dnsampaio reopened this revision.Jul 3 2018, 4:44 AM

This comment was removed by dnsampaio.

This revision is now accepted and ready to land.Jul 3 2018, 4:44 AM

Please add the new intrinsics to the target specific combine function of VLDUP NEON load/store intrinsics
ARMISelLowering.cpp, line 11477. The switch dies on llvm_unreachable.

https://bugs.llvm.org/show_bug.cgi?id=38031

This revision now requires changes to proceed.Jul 3 2018, 4:48 AM

OK, I'm on it. Thanks.

D48920 resolves the issue.

All good.

This revision is now accepted and ready to land.Jul 4 2018, 8:33 AM

kosarev mentioned this in rL336325: [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses.Jul 5 2018, 2:04 AM

D48920 is landed.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsARM.td

14 lines

lib/

Target/

ARM/

ARMBaseInstrInfo.cpp

18 lines

ARMExpandPseudoInsts.cpp

83 lines

ARMISelDAGToDAG.cpp

200 lines

ARMISelLowering.cpp

8 lines

ARMInstrNEON.td

23 lines

test/

CodeGen/

ARM/

arm-vlddup.ll

234 lines

Diff 153075

llvm/trunk/include/llvm/IR/IntrinsicsARM.td

Show First 20 Lines • Show All 646 Lines • ▼ Show 20 Lines	def int_arm_neon_vld3lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
[IntrReadMem, IntrArgMemOnly]>;		[IntrReadMem, IntrArgMemOnly]>;
def int_arm_neon_vld4lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,		def int_arm_neon_vld4lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
LLVMMatchType<0>, LLVMMatchType<0>],		LLVMMatchType<0>, LLVMMatchType<0>],
[llvm_anyptr_ty, LLVMMatchType<0>,		[llvm_anyptr_ty, LLVMMatchType<0>,
LLVMMatchType<0>, LLVMMatchType<0>,		LLVMMatchType<0>, LLVMMatchType<0>,
LLVMMatchType<0>, llvm_i32_ty,		LLVMMatchType<0>, llvm_i32_ty,
llvm_i32_ty], [IntrReadMem, IntrArgMemOnly]>;		llvm_i32_ty], [IntrReadMem, IntrArgMemOnly]>;

		// Vector load N-element structure to all lanes.
		// Source operands are the address and alignment.
		def int_arm_neon_vld2dup : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],
		[llvm_anyptr_ty, llvm_i32_ty],
		[IntrReadMem, IntrArgMemOnly]>;
		def int_arm_neon_vld3dup : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
		LLVMMatchType<0>],
		[llvm_anyptr_ty, llvm_i32_ty],
		[IntrReadMem, IntrArgMemOnly]>;
		def int_arm_neon_vld4dup : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
		LLVMMatchType<0>, LLVMMatchType<0>],
		[llvm_anyptr_ty, llvm_i32_ty],
		[IntrReadMem, IntrArgMemOnly]>;

// Interleaving vector stores from N-element structures.		// Interleaving vector stores from N-element structures.
// Source operands are: the address, the N vectors, and the alignment.		// Source operands are: the address, the N vectors, and the alignment.
def int_arm_neon_vst1 : Intrinsic<[],		def int_arm_neon_vst1 : Intrinsic<[],
[llvm_anyptr_ty, llvm_anyvector_ty,		[llvm_anyptr_ty, llvm_anyvector_ty,
llvm_i32_ty], [IntrArgMemOnly]>;		llvm_i32_ty], [IntrArgMemOnly]>;
def int_arm_neon_vst2 : Intrinsic<[],		def int_arm_neon_vst2 : Intrinsic<[],
[llvm_anyptr_ty, llvm_anyvector_ty,		[llvm_anyptr_ty, llvm_anyvector_ty,
LLVMMatchType<1>, llvm_i32_ty],		LLVMMatchType<1>, llvm_i32_ty],
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp

Show First 20 Lines • Show All 4,304 Lines • ▼ Show 20 Lines	if (DefAlign < 8 && Subtarget.checkVLDnAccessAlignment())
case ARM::VLD2DUPd16:		case ARM::VLD2DUPd16:
case ARM::VLD2DUPd32:		case ARM::VLD2DUPd32:
case ARM::VLD2DUPd8wb_fixed:		case ARM::VLD2DUPd8wb_fixed:
case ARM::VLD2DUPd16wb_fixed:		case ARM::VLD2DUPd16wb_fixed:
case ARM::VLD2DUPd32wb_fixed:		case ARM::VLD2DUPd32wb_fixed:
case ARM::VLD2DUPd8wb_register:		case ARM::VLD2DUPd8wb_register:
case ARM::VLD2DUPd16wb_register:		case ARM::VLD2DUPd16wb_register:
case ARM::VLD2DUPd32wb_register:		case ARM::VLD2DUPd32wb_register:
		case ARM::VLD2DUPq8EvenPseudo:
		case ARM::VLD2DUPq8OddPseudo:
		case ARM::VLD2DUPq16EvenPseudo:
		case ARM::VLD2DUPq16OddPseudo:
		case ARM::VLD2DUPq32EvenPseudo:
		case ARM::VLD2DUPq32OddPseudo:
		case ARM::VLD3DUPq8EvenPseudo:
		case ARM::VLD3DUPq8OddPseudo:
		case ARM::VLD3DUPq16EvenPseudo:
		case ARM::VLD3DUPq16OddPseudo:
		case ARM::VLD3DUPq32EvenPseudo:
		case ARM::VLD3DUPq32OddPseudo:
case ARM::VLD4DUPd8Pseudo:		case ARM::VLD4DUPd8Pseudo:
case ARM::VLD4DUPd16Pseudo:		case ARM::VLD4DUPd16Pseudo:
case ARM::VLD4DUPd32Pseudo:		case ARM::VLD4DUPd32Pseudo:
case ARM::VLD4DUPd8Pseudo_UPD:		case ARM::VLD4DUPd8Pseudo_UPD:
case ARM::VLD4DUPd16Pseudo_UPD:		case ARM::VLD4DUPd16Pseudo_UPD:
case ARM::VLD4DUPd32Pseudo_UPD:		case ARM::VLD4DUPd32Pseudo_UPD:
		case ARM::VLD4DUPq8EvenPseudo:
		case ARM::VLD4DUPq8OddPseudo:
		case ARM::VLD4DUPq16EvenPseudo:
		case ARM::VLD4DUPq16OddPseudo:
		case ARM::VLD4DUPq32EvenPseudo:
		case ARM::VLD4DUPq32OddPseudo:
case ARM::VLD1LNq8Pseudo:		case ARM::VLD1LNq8Pseudo:
case ARM::VLD1LNq16Pseudo:		case ARM::VLD1LNq16Pseudo:
case ARM::VLD1LNq32Pseudo:		case ARM::VLD1LNq32Pseudo:
case ARM::VLD1LNq8Pseudo_UPD:		case ARM::VLD1LNq8Pseudo_UPD:
case ARM::VLD1LNq16Pseudo_UPD:		case ARM::VLD1LNq16Pseudo_UPD:
case ARM::VLD1LNq32Pseudo_UPD:		case ARM::VLD1LNq32Pseudo_UPD:
case ARM::VLD2LNd8Pseudo:		case ARM::VLD2LNd8Pseudo:
case ARM::VLD2LNd16Pseudo:		case ARM::VLD2LNd16Pseudo:
▲ Show 20 Lines • Show All 719 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
{ ARM::VLD1q64HighTPseudo, ARM::VLD1d64T, true, false, false, SingleHighTSpc, 3, 1 ,false},		{ ARM::VLD1q64HighTPseudo, ARM::VLD1d64T, true, false, false, SingleHighTSpc, 3, 1 ,false},
{ ARM::VLD1q64LowQPseudo_UPD, ARM::VLD1d64Qwb_fixed, true, true, true, SingleLowSpc, 4, 1 ,false},		{ ARM::VLD1q64LowQPseudo_UPD, ARM::VLD1d64Qwb_fixed, true, true, true, SingleLowSpc, 4, 1 ,false},
{ ARM::VLD1q64LowTPseudo_UPD, ARM::VLD1d64Twb_fixed, true, true, true, SingleLowSpc, 3, 1 ,false},		{ ARM::VLD1q64LowTPseudo_UPD, ARM::VLD1d64Twb_fixed, true, true, true, SingleLowSpc, 3, 1 ,false},
{ ARM::VLD1q8HighQPseudo, ARM::VLD1d8Q, true, false, false, SingleHighQSpc, 4, 8 ,false},		{ ARM::VLD1q8HighQPseudo, ARM::VLD1d8Q, true, false, false, SingleHighQSpc, 4, 8 ,false},
{ ARM::VLD1q8HighTPseudo, ARM::VLD1d8T, true, false, false, SingleHighTSpc, 3, 8 ,false},		{ ARM::VLD1q8HighTPseudo, ARM::VLD1d8T, true, false, false, SingleHighTSpc, 3, 8 ,false},
{ ARM::VLD1q8LowQPseudo_UPD, ARM::VLD1d8Qwb_fixed, true, true, true, SingleLowSpc, 4, 8 ,false},		{ ARM::VLD1q8LowQPseudo_UPD, ARM::VLD1d8Qwb_fixed, true, true, true, SingleLowSpc, 4, 8 ,false},
{ ARM::VLD1q8LowTPseudo_UPD, ARM::VLD1d8Twb_fixed, true, true, true, SingleLowSpc, 3, 8 ,false},		{ ARM::VLD1q8LowTPseudo_UPD, ARM::VLD1d8Twb_fixed, true, true, true, SingleLowSpc, 3, 8 ,false},

		{ ARM::VLD2DUPq16EvenPseudo, ARM::VLD2DUPd16x2, true, false, false, EvenDblSpc, 2, 4 ,false},
		{ ARM::VLD2DUPq16OddPseudo, ARM::VLD2DUPd16x2, true, false, false, OddDblSpc, 2, 4 ,false},
		{ ARM::VLD2DUPq32EvenPseudo, ARM::VLD2DUPd32x2, true, false, false, EvenDblSpc, 2, 2 ,false},
		{ ARM::VLD2DUPq32OddPseudo, ARM::VLD2DUPd32x2, true, false, false, OddDblSpc, 2, 2 ,false},
		{ ARM::VLD2DUPq8EvenPseudo, ARM::VLD2DUPd8x2, true, false, false, EvenDblSpc, 2, 8 ,false},
		{ ARM::VLD2DUPq8OddPseudo, ARM::VLD2DUPd8x2, true, false, false, OddDblSpc, 2, 8 ,false},

{ ARM::VLD2LNd16Pseudo, ARM::VLD2LNd16, true, false, false, SingleSpc, 2, 4 ,true},		{ ARM::VLD2LNd16Pseudo, ARM::VLD2LNd16, true, false, false, SingleSpc, 2, 4 ,true},
{ ARM::VLD2LNd16Pseudo_UPD, ARM::VLD2LNd16_UPD, true, true, true, SingleSpc, 2, 4 ,true},		{ ARM::VLD2LNd16Pseudo_UPD, ARM::VLD2LNd16_UPD, true, true, true, SingleSpc, 2, 4 ,true},
{ ARM::VLD2LNd32Pseudo, ARM::VLD2LNd32, true, false, false, SingleSpc, 2, 2 ,true},		{ ARM::VLD2LNd32Pseudo, ARM::VLD2LNd32, true, false, false, SingleSpc, 2, 2 ,true},
{ ARM::VLD2LNd32Pseudo_UPD, ARM::VLD2LNd32_UPD, true, true, true, SingleSpc, 2, 2 ,true},		{ ARM::VLD2LNd32Pseudo_UPD, ARM::VLD2LNd32_UPD, true, true, true, SingleSpc, 2, 2 ,true},
{ ARM::VLD2LNd8Pseudo, ARM::VLD2LNd8, true, false, false, SingleSpc, 2, 8 ,true},		{ ARM::VLD2LNd8Pseudo, ARM::VLD2LNd8, true, false, false, SingleSpc, 2, 8 ,true},
{ ARM::VLD2LNd8Pseudo_UPD, ARM::VLD2LNd8_UPD, true, true, true, SingleSpc, 2, 8 ,true},		{ ARM::VLD2LNd8Pseudo_UPD, ARM::VLD2LNd8_UPD, true, true, true, SingleSpc, 2, 8 ,true},
{ ARM::VLD2LNq16Pseudo, ARM::VLD2LNq16, true, false, false, EvenDblSpc, 2, 4 ,true},		{ ARM::VLD2LNq16Pseudo, ARM::VLD2LNq16, true, false, false, EvenDblSpc, 2, 4 ,true},
{ ARM::VLD2LNq16Pseudo_UPD, ARM::VLD2LNq16_UPD, true, true, true, EvenDblSpc, 2, 4 ,true},		{ ARM::VLD2LNq16Pseudo_UPD, ARM::VLD2LNq16_UPD, true, true, true, EvenDblSpc, 2, 4 ,true},
Show All 11 Lines
{ ARM::VLD2q8PseudoWB_register, ARM::VLD2q8wb_register, true, true, true, SingleSpc, 4, 8 ,false},		{ ARM::VLD2q8PseudoWB_register, ARM::VLD2q8wb_register, true, true, true, SingleSpc, 4, 8 ,false},

{ ARM::VLD3DUPd16Pseudo, ARM::VLD3DUPd16, true, false, false, SingleSpc, 3, 4,true},		{ ARM::VLD3DUPd16Pseudo, ARM::VLD3DUPd16, true, false, false, SingleSpc, 3, 4,true},
{ ARM::VLD3DUPd16Pseudo_UPD, ARM::VLD3DUPd16_UPD, true, true, true, SingleSpc, 3, 4,true},		{ ARM::VLD3DUPd16Pseudo_UPD, ARM::VLD3DUPd16_UPD, true, true, true, SingleSpc, 3, 4,true},
{ ARM::VLD3DUPd32Pseudo, ARM::VLD3DUPd32, true, false, false, SingleSpc, 3, 2,true},		{ ARM::VLD3DUPd32Pseudo, ARM::VLD3DUPd32, true, false, false, SingleSpc, 3, 2,true},
{ ARM::VLD3DUPd32Pseudo_UPD, ARM::VLD3DUPd32_UPD, true, true, true, SingleSpc, 3, 2,true},		{ ARM::VLD3DUPd32Pseudo_UPD, ARM::VLD3DUPd32_UPD, true, true, true, SingleSpc, 3, 2,true},
{ ARM::VLD3DUPd8Pseudo, ARM::VLD3DUPd8, true, false, false, SingleSpc, 3, 8,true},		{ ARM::VLD3DUPd8Pseudo, ARM::VLD3DUPd8, true, false, false, SingleSpc, 3, 8,true},
{ ARM::VLD3DUPd8Pseudo_UPD, ARM::VLD3DUPd8_UPD, true, true, true, SingleSpc, 3, 8,true},		{ ARM::VLD3DUPd8Pseudo_UPD, ARM::VLD3DUPd8_UPD, true, true, true, SingleSpc, 3, 8,true},
		{ ARM::VLD3DUPq16EvenPseudo, ARM::VLD3DUPq16, true, false, false, EvenDblSpc, 3, 4 ,true},
		{ ARM::VLD3DUPq16OddPseudo, ARM::VLD3DUPq16, true, false, false, OddDblSpc, 3, 4 ,true},
		{ ARM::VLD3DUPq32EvenPseudo, ARM::VLD3DUPq32, true, false, false, EvenDblSpc, 3, 2 ,true},
		{ ARM::VLD3DUPq32OddPseudo, ARM::VLD3DUPq32, true, false, false, OddDblSpc, 3, 2 ,true},
		{ ARM::VLD3DUPq8EvenPseudo, ARM::VLD3DUPq8, true, false, false, EvenDblSpc, 3, 8 ,true},
		{ ARM::VLD3DUPq8OddPseudo, ARM::VLD3DUPq8, true, false, false, OddDblSpc, 3, 8 ,true},

{ ARM::VLD3LNd16Pseudo, ARM::VLD3LNd16, true, false, false, SingleSpc, 3, 4 ,true},		{ ARM::VLD3LNd16Pseudo, ARM::VLD3LNd16, true, false, false, SingleSpc, 3, 4 ,true},
{ ARM::VLD3LNd16Pseudo_UPD, ARM::VLD3LNd16_UPD, true, true, true, SingleSpc, 3, 4 ,true},		{ ARM::VLD3LNd16Pseudo_UPD, ARM::VLD3LNd16_UPD, true, true, true, SingleSpc, 3, 4 ,true},
{ ARM::VLD3LNd32Pseudo, ARM::VLD3LNd32, true, false, false, SingleSpc, 3, 2 ,true},		{ ARM::VLD3LNd32Pseudo, ARM::VLD3LNd32, true, false, false, SingleSpc, 3, 2 ,true},
{ ARM::VLD3LNd32Pseudo_UPD, ARM::VLD3LNd32_UPD, true, true, true, SingleSpc, 3, 2 ,true},		{ ARM::VLD3LNd32Pseudo_UPD, ARM::VLD3LNd32_UPD, true, true, true, SingleSpc, 3, 2 ,true},
{ ARM::VLD3LNd8Pseudo, ARM::VLD3LNd8, true, false, false, SingleSpc, 3, 8 ,true},		{ ARM::VLD3LNd8Pseudo, ARM::VLD3LNd8, true, false, false, SingleSpc, 3, 8 ,true},
{ ARM::VLD3LNd8Pseudo_UPD, ARM::VLD3LNd8_UPD, true, true, true, SingleSpc, 3, 8 ,true},		{ ARM::VLD3LNd8Pseudo_UPD, ARM::VLD3LNd8_UPD, true, true, true, SingleSpc, 3, 8 ,true},
{ ARM::VLD3LNq16Pseudo, ARM::VLD3LNq16, true, false, false, EvenDblSpc, 3, 4 ,true},		{ ARM::VLD3LNq16Pseudo, ARM::VLD3LNq16, true, false, false, EvenDblSpc, 3, 4 ,true},
Show All 19 Lines
{ ARM::VLD3q8oddPseudo_UPD, ARM::VLD3q8_UPD, true, true, true, OddDblSpc, 3, 8 ,true},		{ ARM::VLD3q8oddPseudo_UPD, ARM::VLD3q8_UPD, true, true, true, OddDblSpc, 3, 8 ,true},

{ ARM::VLD4DUPd16Pseudo, ARM::VLD4DUPd16, true, false, false, SingleSpc, 4, 4,true},		{ ARM::VLD4DUPd16Pseudo, ARM::VLD4DUPd16, true, false, false, SingleSpc, 4, 4,true},
{ ARM::VLD4DUPd16Pseudo_UPD, ARM::VLD4DUPd16_UPD, true, true, true, SingleSpc, 4, 4,true},		{ ARM::VLD4DUPd16Pseudo_UPD, ARM::VLD4DUPd16_UPD, true, true, true, SingleSpc, 4, 4,true},
{ ARM::VLD4DUPd32Pseudo, ARM::VLD4DUPd32, true, false, false, SingleSpc, 4, 2,true},		{ ARM::VLD4DUPd32Pseudo, ARM::VLD4DUPd32, true, false, false, SingleSpc, 4, 2,true},
{ ARM::VLD4DUPd32Pseudo_UPD, ARM::VLD4DUPd32_UPD, true, true, true, SingleSpc, 4, 2,true},		{ ARM::VLD4DUPd32Pseudo_UPD, ARM::VLD4DUPd32_UPD, true, true, true, SingleSpc, 4, 2,true},
{ ARM::VLD4DUPd8Pseudo, ARM::VLD4DUPd8, true, false, false, SingleSpc, 4, 8,true},		{ ARM::VLD4DUPd8Pseudo, ARM::VLD4DUPd8, true, false, false, SingleSpc, 4, 8,true},
{ ARM::VLD4DUPd8Pseudo_UPD, ARM::VLD4DUPd8_UPD, true, true, true, SingleSpc, 4, 8,true},		{ ARM::VLD4DUPd8Pseudo_UPD, ARM::VLD4DUPd8_UPD, true, true, true, SingleSpc, 4, 8,true},
		{ ARM::VLD4DUPq16EvenPseudo, ARM::VLD4DUPq16, true, false, false, EvenDblSpc, 4, 4 ,true},
		{ ARM::VLD4DUPq16OddPseudo, ARM::VLD4DUPq16, true, false, false, OddDblSpc, 4, 4 ,true},
		{ ARM::VLD4DUPq32EvenPseudo, ARM::VLD4DUPq32, true, false, false, EvenDblSpc, 4, 2 ,true},
		{ ARM::VLD4DUPq32OddPseudo, ARM::VLD4DUPq32, true, false, false, OddDblSpc, 4, 2 ,true},
		{ ARM::VLD4DUPq8EvenPseudo, ARM::VLD4DUPq8, true, false, false, EvenDblSpc, 4, 8 ,true},
		{ ARM::VLD4DUPq8OddPseudo, ARM::VLD4DUPq8, true, false, false, OddDblSpc, 4, 8 ,true},

{ ARM::VLD4LNd16Pseudo, ARM::VLD4LNd16, true, false, false, SingleSpc, 4, 4 ,true},		{ ARM::VLD4LNd16Pseudo, ARM::VLD4LNd16, true, false, false, SingleSpc, 4, 4 ,true},
{ ARM::VLD4LNd16Pseudo_UPD, ARM::VLD4LNd16_UPD, true, true, true, SingleSpc, 4, 4 ,true},		{ ARM::VLD4LNd16Pseudo_UPD, ARM::VLD4LNd16_UPD, true, true, true, SingleSpc, 4, 4 ,true},
{ ARM::VLD4LNd32Pseudo, ARM::VLD4LNd32, true, false, false, SingleSpc, 4, 2 ,true},		{ ARM::VLD4LNd32Pseudo, ARM::VLD4LNd32, true, false, false, SingleSpc, 4, 2 ,true},
{ ARM::VLD4LNd32Pseudo_UPD, ARM::VLD4LNd32_UPD, true, true, true, SingleSpc, 4, 2 ,true},		{ ARM::VLD4LNd32Pseudo_UPD, ARM::VLD4LNd32_UPD, true, true, true, SingleSpc, 4, 2 ,true},
{ ARM::VLD4LNd8Pseudo, ARM::VLD4LNd8, true, false, false, SingleSpc, 4, 8 ,true},		{ ARM::VLD4LNd8Pseudo, ARM::VLD4LNd8, true, false, false, SingleSpc, 4, 8 ,true},
{ ARM::VLD4LNd8Pseudo_UPD, ARM::VLD4LNd8_UPD, true, true, true, SingleSpc, 4, 8 ,true},		{ ARM::VLD4LNd8Pseudo_UPD, ARM::VLD4LNd8_UPD, true, true, true, SingleSpc, 4, 8 ,true},
{ ARM::VLD4LNq16Pseudo, ARM::VLD4LNq16, true, false, false, EvenDblSpc, 4, 4 ,true},		{ ARM::VLD4LNq16Pseudo, ARM::VLD4LNq16, true, false, false, EvenDblSpc, 4, 4 ,true},
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	void ARMExpandPseudo::ExpandVLD(MachineBasicBlock::iterator &MBBI) {
unsigned NumRegs = TableEntry->NumRegs;		unsigned NumRegs = TableEntry->NumRegs;

MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(),		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(),
TII->get(TableEntry->RealOpc));		TII->get(TableEntry->RealOpc));
unsigned OpIdx = 0;		unsigned OpIdx = 0;

bool DstIsDead = MI.getOperand(OpIdx).isDead();		bool DstIsDead = MI.getOperand(OpIdx).isDead();
unsigned DstReg = MI.getOperand(OpIdx++).getReg();		unsigned DstReg = MI.getOperand(OpIdx++).getReg();
		if(TableEntry->RealOpc == ARM::VLD2DUPd8x2 \|\|
		TableEntry->RealOpc == ARM::VLD2DUPd16x2 \|\|
		TableEntry->RealOpc == ARM::VLD2DUPd32x2) {
		unsigned SubRegIndex;
		if (RegSpc == EvenDblSpc) {
		SubRegIndex = ARM::dsub_0;
		} else {
		assert(RegSpc == OddDblSpc && "Unexpected spacing!");
		SubRegIndex = ARM::dsub_1;
		}
		unsigned SubReg = TRI->getSubReg(DstReg, SubRegIndex);
		unsigned DstRegPair = TRI->getMatchingSuperReg(SubReg, ARM::dsub_0,
		&ARM::DPairSpcRegClass);
		MIB.addReg(DstRegPair, RegState::Define \| getDeadRegState(DstIsDead));
		} else {
unsigned D0, D1, D2, D3;		unsigned D0, D1, D2, D3;
GetDSubRegs(DstReg, RegSpc, TRI, D0, D1, D2, D3);		GetDSubRegs(DstReg, RegSpc, TRI, D0, D1, D2, D3);
MIB.addReg(D0, RegState::Define \| getDeadRegState(DstIsDead));		MIB.addReg(D0, RegState::Define \| getDeadRegState(DstIsDead));
if (NumRegs > 1 && TableEntry->copyAllListRegs)		if (NumRegs > 1 && TableEntry->copyAllListRegs)
MIB.addReg(D1, RegState::Define \| getDeadRegState(DstIsDead));		MIB.addReg(D1, RegState::Define \| getDeadRegState(DstIsDead));
if (NumRegs > 2 && TableEntry->copyAllListRegs)		if (NumRegs > 2 && TableEntry->copyAllListRegs)
MIB.addReg(D2, RegState::Define \| getDeadRegState(DstIsDead));		MIB.addReg(D2, RegState::Define \| getDeadRegState(DstIsDead));
if (NumRegs > 3 && TableEntry->copyAllListRegs)		if (NumRegs > 3 && TableEntry->copyAllListRegs)
MIB.addReg(D3, RegState::Define \| getDeadRegState(DstIsDead));		MIB.addReg(D3, RegState::Define \| getDeadRegState(DstIsDead));
		}

if (TableEntry->isUpdating)		if (TableEntry->isUpdating)
MIB.add(MI.getOperand(OpIdx++));		MIB.add(MI.getOperand(OpIdx++));

// Copy the addrmode6 operands.		// Copy the addrmode6 operands.
MIB.add(MI.getOperand(OpIdx++));		MIB.add(MI.getOperand(OpIdx++));
MIB.add(MI.getOperand(OpIdx++));		MIB.add(MI.getOperand(OpIdx++));

Show All 22 Lines	if (TableEntry->RealOpc == ARM::VLD1d8Qwb_fixed \|\|
MIB.add(AM6Offset);		MIB.add(AM6Offset);
}		}
}		}

// For an instruction writing double-spaced subregs, the pseudo instruction		// For an instruction writing double-spaced subregs, the pseudo instruction
// has an extra operand that is a use of the super-register. Record the		// has an extra operand that is a use of the super-register. Record the
// operand index and skip over it.		// operand index and skip over it.
unsigned SrcOpIdx = 0;		unsigned SrcOpIdx = 0;
		if(TableEntry->RealOpc != ARM::VLD2DUPd8x2 &&
		TableEntry->RealOpc != ARM::VLD2DUPd16x2 &&
		TableEntry->RealOpc != ARM::VLD2DUPd32x2) {
if (RegSpc == EvenDblSpc \|\| RegSpc == OddDblSpc \|\|		if (RegSpc == EvenDblSpc \|\| RegSpc == OddDblSpc \|\|
RegSpc == SingleLowSpc \|\| RegSpc == SingleHighQSpc \|\|		RegSpc == SingleLowSpc \|\| RegSpc == SingleHighQSpc \|\|
RegSpc == SingleHighTSpc)		RegSpc == SingleHighTSpc)
SrcOpIdx = OpIdx++;		SrcOpIdx = OpIdx++;
		}

// Copy the predicate operands.		// Copy the predicate operands.
MIB.add(MI.getOperand(OpIdx++));		MIB.add(MI.getOperand(OpIdx++));
MIB.add(MI.getOperand(OpIdx++));		MIB.add(MI.getOperand(OpIdx++));

// Copy the super-register source operand used for double-spaced subregs over		// Copy the super-register source operand used for double-spaced subregs over
// to the new instruction as an implicit operand.		// to the new instruction as an implicit operand.
if (SrcOpIdx != 0) {		if (SrcOpIdx != 0) {
▲ Show 20 Lines • Show All 1,144 Lines • ▼ Show 20 Lines	switch (Opcode) {
case ARM::VLD3DUPd16Pseudo_UPD:		case ARM::VLD3DUPd16Pseudo_UPD:
case ARM::VLD3DUPd32Pseudo_UPD:		case ARM::VLD3DUPd32Pseudo_UPD:
case ARM::VLD4DUPd8Pseudo:		case ARM::VLD4DUPd8Pseudo:
case ARM::VLD4DUPd16Pseudo:		case ARM::VLD4DUPd16Pseudo:
case ARM::VLD4DUPd32Pseudo:		case ARM::VLD4DUPd32Pseudo:
case ARM::VLD4DUPd8Pseudo_UPD:		case ARM::VLD4DUPd8Pseudo_UPD:
case ARM::VLD4DUPd16Pseudo_UPD:		case ARM::VLD4DUPd16Pseudo_UPD:
case ARM::VLD4DUPd32Pseudo_UPD:		case ARM::VLD4DUPd32Pseudo_UPD:
		case ARM::VLD2DUPq8EvenPseudo:
		case ARM::VLD2DUPq8OddPseudo:
		case ARM::VLD2DUPq16EvenPseudo:
		case ARM::VLD2DUPq16OddPseudo:
		case ARM::VLD2DUPq32EvenPseudo:
		case ARM::VLD2DUPq32OddPseudo:
		case ARM::VLD3DUPq8EvenPseudo:
		case ARM::VLD3DUPq8OddPseudo:
		case ARM::VLD3DUPq16EvenPseudo:
		case ARM::VLD3DUPq16OddPseudo:
		case ARM::VLD3DUPq32EvenPseudo:
		case ARM::VLD3DUPq32OddPseudo:
		case ARM::VLD4DUPq8EvenPseudo:
		case ARM::VLD4DUPq8OddPseudo:
		case ARM::VLD4DUPq16EvenPseudo:
		case ARM::VLD4DUPq16OddPseudo:
		case ARM::VLD4DUPq32EvenPseudo:
		case ARM::VLD4DUPq32OddPseudo:
ExpandVLD(MBBI);		ExpandVLD(MBBI);
return true;		return true;

case ARM::VST2q8Pseudo:		case ARM::VST2q8Pseudo:
case ARM::VST2q16Pseudo:		case ARM::VST2q16Pseudo:
case ARM::VST2q32Pseudo:		case ARM::VST2q32Pseudo:
case ARM::VST2q8PseudoWB_fixed:		case ARM::VST2q8PseudoWB_fixed:
case ARM::VST2q16PseudoWB_fixed:		case ARM::VST2q16PseudoWB_fixed:
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	private:
/// be 2, 3 or 4. The opcode arrays specify the instructions used for		/// be 2, 3 or 4. The opcode arrays specify the instructions used for
/// load/store of D registers and Q registers.		/// load/store of D registers and Q registers.
void SelectVLDSTLane(SDNode *N, bool IsLoad, bool isUpdating,		void SelectVLDSTLane(SDNode *N, bool IsLoad, bool isUpdating,
unsigned NumVecs, const uint16_t *DOpcodes,		unsigned NumVecs, const uint16_t *DOpcodes,
const uint16_t *QOpcodes);		const uint16_t *QOpcodes);

/// SelectVLDDup - Select NEON load-duplicate intrinsics. NumVecs		/// SelectVLDDup - Select NEON load-duplicate intrinsics. NumVecs
/// should be 1, 2, 3 or 4. The opcode array specifies the instructions used		/// should be 1, 2, 3 or 4. The opcode array specifies the instructions used
/// for loading D registers. (Q registers are not supported.)		/// for loading D registers.
void SelectVLDDup(SDNode *N, bool isUpdating, unsigned NumVecs,		void SelectVLDDup(SDNode *N, bool IsIntrinsic, bool isUpdating,
const uint16_t *DOpcodes,		unsigned NumVecs, const uint16_t *DOpcodes,
const uint16_t *QOpcodes = nullptr);		const uint16_t *QOpcodes0 = nullptr,
		const uint16_t *QOpcodes1 = nullptr);

/// Try to select SBFX/UBFX instructions for ARM.		/// Try to select SBFX/UBFX instructions for ARM.
bool tryV6T2BitfieldExtractOp(SDNode *N, bool isSigned);		bool tryV6T2BitfieldExtractOp(SDNode *N, bool isSigned);

// Select special operations if node forms integer ABS pattern		// Select special operations if node forms integer ABS pattern
bool tryABSOp(SDNode *N);		bool tryABSOp(SDNode *N);

bool tryReadRegister(SDNode *N);		bool tryReadRegister(SDNode *N);
▲ Show 20 Lines • Show All 1,524 Lines • ▼ Show 20 Lines
void ARMDAGToDAGISel::SelectVLD(SDNode *N, bool isUpdating, unsigned NumVecs,		void ARMDAGToDAGISel::SelectVLD(SDNode *N, bool isUpdating, unsigned NumVecs,
const uint16_t *DOpcodes,		const uint16_t *DOpcodes,
const uint16_t *QOpcodes0,		const uint16_t *QOpcodes0,
const uint16_t *QOpcodes1) {		const uint16_t *QOpcodes1) {
assert(NumVecs >= 1 && NumVecs <= 4 && "VLD NumVecs out-of-range");		assert(NumVecs >= 1 && NumVecs <= 4 && "VLD NumVecs out-of-range");
SDLoc dl(N);		SDLoc dl(N);

SDValue MemAddr, Align;		SDValue MemAddr, Align;
unsigned AddrOpIdx = isUpdating ? 1 : 2;		bool IsIntrinsic = !isUpdating; // By coincidence, all supported updating
		// nodes are not intrinsics.
		unsigned AddrOpIdx = IsIntrinsic ? 2 : 1;
if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))		if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))
return;		return;

SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
bool is64BitVector = VT.is64BitVector();		bool is64BitVector = VT.is64BitVector();
Align = GetVLDSTAlign(Align, dl, NumVecs, is64BitVector);		Align = GetVLDSTAlign(Align, dl, NumVecs, is64BitVector);

▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
void ARMDAGToDAGISel::SelectVST(SDNode *N, bool isUpdating, unsigned NumVecs,		void ARMDAGToDAGISel::SelectVST(SDNode *N, bool isUpdating, unsigned NumVecs,
const uint16_t *DOpcodes,		const uint16_t *DOpcodes,
const uint16_t *QOpcodes0,		const uint16_t *QOpcodes0,
const uint16_t *QOpcodes1) {		const uint16_t *QOpcodes1) {
assert(NumVecs >= 1 && NumVecs <= 4 && "VST NumVecs out-of-range");		assert(NumVecs >= 1 && NumVecs <= 4 && "VST NumVecs out-of-range");
SDLoc dl(N);		SDLoc dl(N);

SDValue MemAddr, Align;		SDValue MemAddr, Align;
unsigned AddrOpIdx = isUpdating ? 1 : 2;		bool IsIntrinsic = !isUpdating; // By coincidence, all supported updating
		// nodes are not intrinsics.
		unsigned AddrOpIdx = IsIntrinsic ? 2 : 1;
unsigned Vec0Idx = 3; // AddrOpIdx + (isUpdating ? 2 : 1)		unsigned Vec0Idx = 3; // AddrOpIdx + (isUpdating ? 2 : 1)
if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))		if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))
return;		return;

MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemIntrinsicSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemIntrinsicSDNode>(N)->getMemOperand();

SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
void ARMDAGToDAGISel::SelectVLDSTLane(SDNode *N, bool IsLoad, bool isUpdating,		void ARMDAGToDAGISel::SelectVLDSTLane(SDNode *N, bool IsLoad, bool isUpdating,
unsigned NumVecs,		unsigned NumVecs,
const uint16_t *DOpcodes,		const uint16_t *DOpcodes,
const uint16_t *QOpcodes) {		const uint16_t *QOpcodes) {
assert(NumVecs >=2 && NumVecs <= 4 && "VLDSTLane NumVecs out-of-range");		assert(NumVecs >=2 && NumVecs <= 4 && "VLDSTLane NumVecs out-of-range");
SDLoc dl(N);		SDLoc dl(N);

SDValue MemAddr, Align;		SDValue MemAddr, Align;
unsigned AddrOpIdx = isUpdating ? 1 : 2;		bool IsIntrinsic = !isUpdating; // By coincidence, all supported updating
		// nodes are not intrinsics.
		unsigned AddrOpIdx = IsIntrinsic ? 2 : 1;
unsigned Vec0Idx = 3; // AddrOpIdx + (isUpdating ? 2 : 1)		unsigned Vec0Idx = 3; // AddrOpIdx + (isUpdating ? 2 : 1)
if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))		if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))
return;		return;

MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemIntrinsicSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemIntrinsicSDNode>(N)->getMemOperand();

SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	for (unsigned Vec = 0; Vec < NumVecs; ++Vec)
ReplaceUses(SDValue(N, Vec),		ReplaceUses(SDValue(N, Vec),
CurDAG->getTargetExtractSubreg(Sub0 + Vec, dl, VT, SuperReg));		CurDAG->getTargetExtractSubreg(Sub0 + Vec, dl, VT, SuperReg));
ReplaceUses(SDValue(N, NumVecs), SDValue(VLdLn, 1));		ReplaceUses(SDValue(N, NumVecs), SDValue(VLdLn, 1));
if (isUpdating)		if (isUpdating)
ReplaceUses(SDValue(N, NumVecs + 1), SDValue(VLdLn, 2));		ReplaceUses(SDValue(N, NumVecs + 1), SDValue(VLdLn, 2));
CurDAG->RemoveDeadNode(N);		CurDAG->RemoveDeadNode(N);
}		}

void ARMDAGToDAGISel::SelectVLDDup(SDNode *N, bool isUpdating, unsigned NumVecs,		void ARMDAGToDAGISel::SelectVLDDup(SDNode *N, bool IsIntrinsic,
		bool isUpdating, unsigned NumVecs,
const uint16_t *DOpcodes,		const uint16_t *DOpcodes,
const uint16_t *QOpcodes) {		const uint16_t *QOpcodes0,
		const uint16_t *QOpcodes1) {
assert(NumVecs >= 1 && NumVecs <= 4 && "VLDDup NumVecs out-of-range");		assert(NumVecs >= 1 && NumVecs <= 4 && "VLDDup NumVecs out-of-range");
SDLoc dl(N);		SDLoc dl(N);

SDValue MemAddr, Align;		SDValue MemAddr, Align;
if (!SelectAddrMode6(N, N->getOperand(1), MemAddr, Align))		unsigned AddrOpIdx = IsIntrinsic ? 2 : 1;
		if (!SelectAddrMode6(N, N->getOperand(AddrOpIdx), MemAddr, Align))
return;		return;

MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemIntrinsicSDNode>(N)->getMemOperand();

SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		bool is64BitVector = VT.is64BitVector();

unsigned Alignment = 0;		unsigned Alignment = 0;
if (NumVecs != 3) {		if (NumVecs != 3) {
Alignment = cast<ConstantSDNode>(Align)->getZExtValue();		Alignment = cast<ConstantSDNode>(Align)->getZExtValue();
unsigned NumBytes = NumVecs * VT.getScalarSizeInBits() / 8;		unsigned NumBytes = NumVecs * VT.getScalarSizeInBits() / 8;
if (Alignment > NumBytes)		if (Alignment > NumBytes)
Alignment = NumBytes;		Alignment = NumBytes;
if (Alignment < 8 && Alignment < NumBytes)		if (Alignment < 8 && Alignment < NumBytes)
Alignment = 0;		Alignment = 0;
// Alignment must be a power of two; make sure of that.		// Alignment must be a power of two; make sure of that.
Alignment = (Alignment & -Alignment);		Alignment = (Alignment & -Alignment);
if (Alignment == 1)		if (Alignment == 1)
Alignment = 0;		Alignment = 0;
}		}
Align = CurDAG->getTargetConstant(Alignment, dl, MVT::i32);		Align = CurDAG->getTargetConstant(Alignment, dl, MVT::i32);

unsigned Opc;		unsigned OpcodeIndex;
switch (VT.getSimpleVT().SimpleTy) {		switch (VT.getSimpleVT().SimpleTy) {
default: llvm_unreachable("unhandled vld-dup type");		default: llvm_unreachable("unhandled vld-dup type");
case MVT::v8i8: Opc = DOpcodes[0]; break;		case MVT::v8i8:
case MVT::v16i8: Opc = QOpcodes[0]; break;		case MVT::v16i8: OpcodeIndex = 0; break;
case MVT::v4i16: Opc = DOpcodes[1]; break;		case MVT::v4i16:
case MVT::v8i16: Opc = QOpcodes[1]; break;		case MVT::v8i16: OpcodeIndex = 1; break;
case MVT::v2f32:		case MVT::v2f32:
case MVT::v2i32: Opc = DOpcodes[2]; break;		case MVT::v2i32:
case MVT::v4f32:		case MVT::v4f32:
case MVT::v4i32: Opc = QOpcodes[2]; break;		case MVT::v4i32: OpcodeIndex = 2; break;
		case MVT::v1f64:
		case MVT::v1i64: OpcodeIndex = 3; break;
}		}

		unsigned ResTyElts = (NumVecs == 3) ? 4 : NumVecs;
		if (!is64BitVector)
		ResTyElts *= 2;
		EVT ResTy = EVT::getVectorVT(*CurDAG->getContext(), MVT::i64, ResTyElts);

		std::vector<EVT> ResTys;
		ResTys.push_back(ResTy);
		if (isUpdating)
		ResTys.push_back(MVT::i32);
		ResTys.push_back(MVT::Other);

SDValue Pred = getAL(CurDAG, dl);		SDValue Pred = getAL(CurDAG, dl);
SDValue Reg0 = CurDAG->getRegister(0, MVT::i32);		SDValue Reg0 = CurDAG->getRegister(0, MVT::i32);

		SDNode *VLdDup;
		if (is64BitVector \|\| NumVecs == 1) {
SmallVector<SDValue, 6> Ops;		SmallVector<SDValue, 6> Ops;
Ops.push_back(MemAddr);		Ops.push_back(MemAddr);
Ops.push_back(Align);		Ops.push_back(Align);
		unsigned Opc = is64BitVector ? DOpcodes[OpcodeIndex] :
		QOpcodes0[OpcodeIndex];
if (isUpdating) {		if (isUpdating) {
// fixed-stride update instructions don't have an explicit writeback		// fixed-stride update instructions don't have an explicit writeback
// operand. It's implicit in the opcode itself.		// operand. It's implicit in the opcode itself.
SDValue Inc = N->getOperand(2);		SDValue Inc = N->getOperand(2);
bool IsImmUpdate =		bool IsImmUpdate =
isPerfectIncrement(Inc, VT.getVectorElementType(), NumVecs);		isPerfectIncrement(Inc, VT.getVectorElementType(), NumVecs);
if (NumVecs <= 2 && !IsImmUpdate)		if (NumVecs <= 2 && !IsImmUpdate)
Opc = getVLDSTRegisterUpdateOpcode(Opc);		Opc = getVLDSTRegisterUpdateOpcode(Opc);
if (!IsImmUpdate)		if (!IsImmUpdate)
Ops.push_back(Inc);		Ops.push_back(Inc);
// FIXME: VLD3 and VLD4 haven't been updated to that form yet.		// FIXME: VLD3 and VLD4 haven't been updated to that form yet.
else if (NumVecs > 2)		else if (NumVecs > 2)
Ops.push_back(Reg0);		Ops.push_back(Reg0);
}		}
Ops.push_back(Pred);		Ops.push_back(Pred);
Ops.push_back(Reg0);		Ops.push_back(Reg0);
Ops.push_back(Chain);		Ops.push_back(Chain);
		VLdDup = CurDAG->getMachineNode(Opc, dl, ResTys, Ops);
		} else if (NumVecs == 2) {
		const SDValue OpsA[] = { MemAddr, Align, Pred, Reg0, Chain };
		SDNode *VLdA = CurDAG->getMachineNode(QOpcodes0[OpcodeIndex],
		dl, ResTys, OpsA);

		Chain = SDValue(VLdA, 1);
		const SDValue OpsB[] = { MemAddr, Align, Pred, Reg0, Chain };
		VLdDup = CurDAG->getMachineNode(QOpcodes1[OpcodeIndex], dl, ResTys, OpsB);
		} else {
		SDValue ImplDef =
		SDValue(CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, dl, ResTy), 0);
		const SDValue OpsA[] = { MemAddr, Align, ImplDef, Pred, Reg0, Chain };
		SDNode *VLdA = CurDAG->getMachineNode(QOpcodes0[OpcodeIndex],
		dl, ResTys, OpsA);

		SDValue SuperReg = SDValue(VLdA, 0);
		Chain = SDValue(VLdA, 1);
		const SDValue OpsB[] = { MemAddr, Align, SuperReg, Pred, Reg0, Chain };
		VLdDup = CurDAG->getMachineNode(QOpcodes1[OpcodeIndex], dl, ResTys, OpsB);
		}

unsigned ResTyElts = (NumVecs == 3) ? 4 : NumVecs;		// Transfer memoperands.
std::vector<EVT> ResTys;		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
ResTys.push_back(EVT::getVectorVT(*CurDAG->getContext(), MVT::i64,ResTyElts));		MemOp[0] = cast<MemIntrinsicSDNode>(N)->getMemOperand();
if (isUpdating)
ResTys.push_back(MVT::i32);
ResTys.push_back(MVT::Other);
SDNode *VLdDup = CurDAG->getMachineNode(Opc, dl, ResTys, Ops);
cast<MachineSDNode>(VLdDup)->setMemRefs(MemOp, MemOp + 1);		cast<MachineSDNode>(VLdDup)->setMemRefs(MemOp, MemOp + 1);

// Extract the subregisters.		// Extract the subregisters.
if (NumVecs == 1) {		if (NumVecs == 1) {
ReplaceUses(SDValue(N, 0), SDValue(VLdDup, 0));		ReplaceUses(SDValue(N, 0), SDValue(VLdDup, 0));
} else {		} else {
SDValue SuperReg = SDValue(VLdDup, 0);		SDValue SuperReg = SDValue(VLdDup, 0);
static_assert(ARM::dsub_7 == ARM::dsub_0 + 7, "Unexpected subreg numbering");		static_assert(ARM::dsub_7 == ARM::dsub_0 + 7, "Unexpected subreg numbering");
unsigned SubIdx = ARM::dsub_0;		unsigned SubIdx = is64BitVector ? ARM::dsub_0 : ARM::qsub_0;
for (unsigned Vec = 0; Vec < NumVecs; ++Vec)		for (unsigned Vec = 0; Vec != NumVecs; ++Vec) {
ReplaceUses(SDValue(N, Vec),		ReplaceUses(SDValue(N, Vec),
CurDAG->getTargetExtractSubreg(SubIdx+Vec, dl, VT, SuperReg));		CurDAG->getTargetExtractSubreg(SubIdx+Vec, dl, VT, SuperReg));
}		}
		}
ReplaceUses(SDValue(N, NumVecs), SDValue(VLdDup, 1));		ReplaceUses(SDValue(N, NumVecs), SDValue(VLdDup, 1));
if (isUpdating)		if (isUpdating)
ReplaceUses(SDValue(N, NumVecs + 1), SDValue(VLdDup, 2));		ReplaceUses(SDValue(N, NumVecs + 1), SDValue(VLdDup, 2));
CurDAG->RemoveDeadNode(N);		CurDAG->RemoveDeadNode(N);
}		}

bool ARMDAGToDAGISel::tryV6T2BitfieldExtractOp(SDNode *N, bool isSigned) {		bool ARMDAGToDAGISel::tryV6T2BitfieldExtractOp(SDNode *N, bool isSigned) {
if (!Subtarget->hasV6T2Ops())		if (!Subtarget->hasV6T2Ops())
▲ Show 20 Lines • Show All 814 Lines • ▼ Show 20 Lines	case ARMISD::BUILD_VECTOR: {
return;		return;
}		}

case ARMISD::VLD1DUP: {		case ARMISD::VLD1DUP: {
static const uint16_t DOpcodes[] = { ARM::VLD1DUPd8, ARM::VLD1DUPd16,		static const uint16_t DOpcodes[] = { ARM::VLD1DUPd8, ARM::VLD1DUPd16,
ARM::VLD1DUPd32 };		ARM::VLD1DUPd32 };
static const uint16_t QOpcodes[] = { ARM::VLD1DUPq8, ARM::VLD1DUPq16,		static const uint16_t QOpcodes[] = { ARM::VLD1DUPq8, ARM::VLD1DUPq16,
ARM::VLD1DUPq32 };		ARM::VLD1DUPq32 };
SelectVLDDup(N, false, 1, DOpcodes, QOpcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, false, 1, DOpcodes, QOpcodes);
return;		return;
}		}

case ARMISD::VLD2DUP: {		case ARMISD::VLD2DUP: {
static const uint16_t Opcodes[] = { ARM::VLD2DUPd8, ARM::VLD2DUPd16,		static const uint16_t Opcodes[] = { ARM::VLD2DUPd8, ARM::VLD2DUPd16,
ARM::VLD2DUPd32 };		ARM::VLD2DUPd32 };
SelectVLDDup(N, false, 2, Opcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, false, 2, Opcodes);
return;		return;
}		}

case ARMISD::VLD3DUP: {		case ARMISD::VLD3DUP: {
static const uint16_t Opcodes[] = { ARM::VLD3DUPd8Pseudo,		static const uint16_t Opcodes[] = { ARM::VLD3DUPd8Pseudo,
ARM::VLD3DUPd16Pseudo,		ARM::VLD3DUPd16Pseudo,
ARM::VLD3DUPd32Pseudo };		ARM::VLD3DUPd32Pseudo };
SelectVLDDup(N, false, 3, Opcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, false, 3, Opcodes);
return;		return;
}		}

case ARMISD::VLD4DUP: {		case ARMISD::VLD4DUP: {
static const uint16_t Opcodes[] = { ARM::VLD4DUPd8Pseudo,		static const uint16_t Opcodes[] = { ARM::VLD4DUPd8Pseudo,
ARM::VLD4DUPd16Pseudo,		ARM::VLD4DUPd16Pseudo,
ARM::VLD4DUPd32Pseudo };		ARM::VLD4DUPd32Pseudo };
SelectVLDDup(N, false, 4, Opcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, false, 4, Opcodes);
return;		return;
}		}

case ARMISD::VLD1DUP_UPD: {		case ARMISD::VLD1DUP_UPD: {
static const uint16_t DOpcodes[] = { ARM::VLD1DUPd8wb_fixed,		static const uint16_t DOpcodes[] = { ARM::VLD1DUPd8wb_fixed,
ARM::VLD1DUPd16wb_fixed,		ARM::VLD1DUPd16wb_fixed,
ARM::VLD1DUPd32wb_fixed };		ARM::VLD1DUPd32wb_fixed };
static const uint16_t QOpcodes[] = { ARM::VLD1DUPq8wb_fixed,		static const uint16_t QOpcodes[] = { ARM::VLD1DUPq8wb_fixed,
ARM::VLD1DUPq16wb_fixed,		ARM::VLD1DUPq16wb_fixed,
ARM::VLD1DUPq32wb_fixed };		ARM::VLD1DUPq32wb_fixed };
SelectVLDDup(N, true, 1, DOpcodes, QOpcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, true, 1, DOpcodes, QOpcodes);
return;		return;
}		}

case ARMISD::VLD2DUP_UPD: {		case ARMISD::VLD2DUP_UPD: {
static const uint16_t Opcodes[] = { ARM::VLD2DUPd8wb_fixed,		static const uint16_t Opcodes[] = { ARM::VLD2DUPd8wb_fixed,
ARM::VLD2DUPd16wb_fixed,		ARM::VLD2DUPd16wb_fixed,
ARM::VLD2DUPd32wb_fixed };		ARM::VLD2DUPd32wb_fixed };
SelectVLDDup(N, true, 2, Opcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, true, 2, Opcodes);
return;		return;
}		}

case ARMISD::VLD3DUP_UPD: {		case ARMISD::VLD3DUP_UPD: {
static const uint16_t Opcodes[] = { ARM::VLD3DUPd8Pseudo_UPD,		static const uint16_t Opcodes[] = { ARM::VLD3DUPd8Pseudo_UPD,
ARM::VLD3DUPd16Pseudo_UPD,		ARM::VLD3DUPd16Pseudo_UPD,
ARM::VLD3DUPd32Pseudo_UPD };		ARM::VLD3DUPd32Pseudo_UPD };
SelectVLDDup(N, true, 3, Opcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, true, 3, Opcodes);
return;		return;
}		}

case ARMISD::VLD4DUP_UPD: {		case ARMISD::VLD4DUP_UPD: {
static const uint16_t Opcodes[] = { ARM::VLD4DUPd8Pseudo_UPD,		static const uint16_t Opcodes[] = { ARM::VLD4DUPd8Pseudo_UPD,
ARM::VLD4DUPd16Pseudo_UPD,		ARM::VLD4DUPd16Pseudo_UPD,
ARM::VLD4DUPd32Pseudo_UPD };		ARM::VLD4DUPd32Pseudo_UPD };
SelectVLDDup(N, true, 4, Opcodes);		SelectVLDDup(N, /* IsIntrinsic= */ false, true, 4, Opcodes);
return;		return;
}		}

case ARMISD::VLD1_UPD: {		case ARMISD::VLD1_UPD: {
static const uint16_t DOpcodes[] = { ARM::VLD1d8wb_fixed,		static const uint16_t DOpcodes[] = { ARM::VLD1d8wb_fixed,
ARM::VLD1d16wb_fixed,		ARM::VLD1d16wb_fixed,
ARM::VLD1d32wb_fixed,		ARM::VLD1d32wb_fixed,
ARM::VLD1d64wb_fixed };		ARM::VLD1d64wb_fixed };
▲ Show 20 Lines • Show All 390 Lines • ▼ Show 20 Lines	case Intrinsic::arm_neon_vld4: {
ARM::VLD4q32Pseudo_UPD };		ARM::VLD4q32Pseudo_UPD };
static const uint16_t QOpcodes1[] = { ARM::VLD4q8oddPseudo,		static const uint16_t QOpcodes1[] = { ARM::VLD4q8oddPseudo,
ARM::VLD4q16oddPseudo,		ARM::VLD4q16oddPseudo,
ARM::VLD4q32oddPseudo };		ARM::VLD4q32oddPseudo };
SelectVLD(N, false, 4, DOpcodes, QOpcodes0, QOpcodes1);		SelectVLD(N, false, 4, DOpcodes, QOpcodes0, QOpcodes1);
return;		return;
}		}

		case Intrinsic::arm_neon_vld2dup: {
		static const uint16_t DOpcodes[] = { ARM::VLD2DUPd8, ARM::VLD2DUPd16,
		ARM::VLD2DUPd32, ARM::VLD1q64 };
		static const uint16_t QOpcodes0[] = { ARM::VLD2DUPq8EvenPseudo,
		ARM::VLD2DUPq16EvenPseudo,
		ARM::VLD2DUPq32EvenPseudo };
		static const uint16_t QOpcodes1[] = { ARM::VLD2DUPq8OddPseudo,
		ARM::VLD2DUPq16OddPseudo,
		ARM::VLD2DUPq32OddPseudo };
		SelectVLDDup(N, /* IsIntrinsic= */ true, false, 2,
		DOpcodes, QOpcodes0, QOpcodes1);
		return;
		}

		case Intrinsic::arm_neon_vld3dup: {
		static const uint16_t DOpcodes[] = { ARM::VLD3DUPd8Pseudo,
		ARM::VLD3DUPd16Pseudo,
		ARM::VLD3DUPd32Pseudo,
		ARM::VLD1d64TPseudo };
		static const uint16_t QOpcodes0[] = { ARM::VLD3DUPq8EvenPseudo,
		ARM::VLD3DUPq16EvenPseudo,
		ARM::VLD3DUPq32EvenPseudo };
		static const uint16_t QOpcodes1[] = { ARM::VLD3DUPq8OddPseudo,
		ARM::VLD3DUPq16OddPseudo,
		ARM::VLD3DUPq32OddPseudo };
		SelectVLDDup(N, /* IsIntrinsic= */ true, false, 3,
		DOpcodes, QOpcodes0, QOpcodes1);
		return;
		}

		case Intrinsic::arm_neon_vld4dup: {
		static const uint16_t DOpcodes[] = { ARM::VLD4DUPd8Pseudo,
		ARM::VLD4DUPd16Pseudo,
		ARM::VLD4DUPd32Pseudo,
		ARM::VLD1d64QPseudo };
		static const uint16_t QOpcodes0[] = { ARM::VLD4DUPq8EvenPseudo,
		ARM::VLD4DUPq16EvenPseudo,
		ARM::VLD4DUPq32EvenPseudo };
		static const uint16_t QOpcodes1[] = { ARM::VLD4DUPq8OddPseudo,
		ARM::VLD4DUPq16OddPseudo,
		ARM::VLD4DUPq32OddPseudo };
		SelectVLDDup(N, /* IsIntrinsic= */ true, false, 4,
		DOpcodes, QOpcodes0, QOpcodes1);
		return;
		}

case Intrinsic::arm_neon_vld2lane: {		case Intrinsic::arm_neon_vld2lane: {
static const uint16_t DOpcodes[] = { ARM::VLD2LNd8Pseudo,		static const uint16_t DOpcodes[] = { ARM::VLD2LNd8Pseudo,
ARM::VLD2LNd16Pseudo,		ARM::VLD2LNd16Pseudo,
ARM::VLD2LNd32Pseudo };		ARM::VLD2LNd32Pseudo };
static const uint16_t QOpcodes[] = { ARM::VLD2LNq16Pseudo,		static const uint16_t QOpcodes[] = { ARM::VLD2LNq16Pseudo,
ARM::VLD2LNq32Pseudo };		ARM::VLD2LNq32Pseudo };
SelectVLDSTLane(N, true, false, 2, DOpcodes, QOpcodes);		SelectVLDSTLane(N, true, false, 2, DOpcodes, QOpcodes);
return;		return;
▲ Show 20 Lines • Show All 697 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,766 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN:
case Intrinsic::arm_neon_vld1x3:		case Intrinsic::arm_neon_vld1x3:
case Intrinsic::arm_neon_vld1x4:		case Intrinsic::arm_neon_vld1x4:
case Intrinsic::arm_neon_vld2:		case Intrinsic::arm_neon_vld2:
case Intrinsic::arm_neon_vld3:		case Intrinsic::arm_neon_vld3:
case Intrinsic::arm_neon_vld4:		case Intrinsic::arm_neon_vld4:
case Intrinsic::arm_neon_vld2lane:		case Intrinsic::arm_neon_vld2lane:
case Intrinsic::arm_neon_vld3lane:		case Intrinsic::arm_neon_vld3lane:
case Intrinsic::arm_neon_vld4lane:		case Intrinsic::arm_neon_vld4lane:
		case Intrinsic::arm_neon_vld2dup:
		case Intrinsic::arm_neon_vld3dup:
		case Intrinsic::arm_neon_vld4dup:
case Intrinsic::arm_neon_vst1:		case Intrinsic::arm_neon_vst1:
case Intrinsic::arm_neon_vst1x2:		case Intrinsic::arm_neon_vst1x2:
case Intrinsic::arm_neon_vst1x3:		case Intrinsic::arm_neon_vst1x3:
case Intrinsic::arm_neon_vst1x4:		case Intrinsic::arm_neon_vst1x4:
case Intrinsic::arm_neon_vst2:		case Intrinsic::arm_neon_vst2:
case Intrinsic::arm_neon_vst3:		case Intrinsic::arm_neon_vst3:
case Intrinsic::arm_neon_vst4:		case Intrinsic::arm_neon_vst4:
case Intrinsic::arm_neon_vst2lane:		case Intrinsic::arm_neon_vst2lane:
▲ Show 20 Lines • Show All 1,278 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
unsigned Intrinsic) const {		unsigned Intrinsic) const {
switch (Intrinsic) {		switch (Intrinsic) {
case Intrinsic::arm_neon_vld1:		case Intrinsic::arm_neon_vld1:
case Intrinsic::arm_neon_vld2:		case Intrinsic::arm_neon_vld2:
case Intrinsic::arm_neon_vld3:		case Intrinsic::arm_neon_vld3:
case Intrinsic::arm_neon_vld4:		case Intrinsic::arm_neon_vld4:
case Intrinsic::arm_neon_vld2lane:		case Intrinsic::arm_neon_vld2lane:
case Intrinsic::arm_neon_vld3lane:		case Intrinsic::arm_neon_vld3lane:
case Intrinsic::arm_neon_vld4lane: {		case Intrinsic::arm_neon_vld4lane:
		case Intrinsic::arm_neon_vld2dup:
		case Intrinsic::arm_neon_vld3dup:
		case Intrinsic::arm_neon_vld4dup: {
Info.opc = ISD::INTRINSIC_W_CHAIN;		Info.opc = ISD::INTRINSIC_W_CHAIN;
// Conservatively set memVT to the entire set of vectors loaded.		// Conservatively set memVT to the entire set of vectors loaded.
auto &DL = I.getCalledFunction()->getParent()->getDataLayout();		auto &DL = I.getCalledFunction()->getParent()->getDataLayout();
uint64_t NumElts = DL.getTypeSizeInBits(I.getType()) / 64;		uint64_t NumElts = DL.getTypeSizeInBits(I.getType()) / 64;
Info.memVT = EVT::getVectorVT(I.getType()->getContext(), MVT::i64, NumElts);		Info.memVT = EVT::getVectorVT(I.getType()->getContext(), MVT::i64, NumElts);
Info.ptrVal = I.getArgOperand(0);		Info.ptrVal = I.getArgOperand(0);
Info.offset = 0;		Info.offset = 0;
Value *AlignArg = I.getArgOperand(I.getNumArgOperands() - 1);		Value *AlignArg = I.getArgOperand(I.getNumArgOperands() - 1);
▲ Show 20 Lines • Show All 825 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMInstrNEON.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	def VecListDPairAllLanes : RegisterOperand<DPair,
let ParserMatchClass = VecListDPairAllLanesAsmOperand;		let ParserMatchClass = VecListDPairAllLanesAsmOperand;
}		}
// Register list of two D registers spaced by 2 (two sequential Q registers).		// Register list of two D registers spaced by 2 (two sequential Q registers).
def VecListDPairSpacedAllLanesAsmOperand : AsmOperandClass {		def VecListDPairSpacedAllLanesAsmOperand : AsmOperandClass {
let Name = "VecListDPairSpacedAllLanes";		let Name = "VecListDPairSpacedAllLanes";
let ParserMethod = "parseVectorList";		let ParserMethod = "parseVectorList";
let RenderMethod = "addVecListOperands";		let RenderMethod = "addVecListOperands";
}		}
def VecListDPairSpacedAllLanes : RegisterOperand<DPair,		def VecListDPairSpacedAllLanes : RegisterOperand<DPairSpc,
"printVectorListTwoSpacedAllLanes"> {		"printVectorListTwoSpacedAllLanes"> {
let ParserMatchClass = VecListDPairSpacedAllLanesAsmOperand;		let ParserMatchClass = VecListDPairSpacedAllLanesAsmOperand;
}		}
// Register list of three D registers, with "all lanes" subscripting.		// Register list of three D registers, with "all lanes" subscripting.
def VecListThreeDAllLanesAsmOperand : AsmOperandClass {		def VecListThreeDAllLanesAsmOperand : AsmOperandClass {
let Name = "VecListThreeDAllLanes";		let Name = "VecListThreeDAllLanes";
let ParserMethod = "parseVectorList";		let ParserMethod = "parseVectorList";
let RenderMethod = "addVecListOperands";		let RenderMethod = "addVecListOperands";
▲ Show 20 Lines • Show All 1,292 Lines • ▼ Show 20 Lines
// ...with double-spaced registers		// ...with double-spaced registers
def VLD2DUPd8x2 : VLD2DUP<{0,0,1,?}, "8", VecListDPairSpacedAllLanes,		def VLD2DUPd8x2 : VLD2DUP<{0,0,1,?}, "8", VecListDPairSpacedAllLanes,
addrmode6dupalign16>;		addrmode6dupalign16>;
def VLD2DUPd16x2 : VLD2DUP<{0,1,1,?}, "16", VecListDPairSpacedAllLanes,		def VLD2DUPd16x2 : VLD2DUP<{0,1,1,?}, "16", VecListDPairSpacedAllLanes,
addrmode6dupalign32>;		addrmode6dupalign32>;
def VLD2DUPd32x2 : VLD2DUP<{1,0,1,?}, "32", VecListDPairSpacedAllLanes,		def VLD2DUPd32x2 : VLD2DUP<{1,0,1,?}, "32", VecListDPairSpacedAllLanes,
addrmode6dupalign64>;		addrmode6dupalign64>;

		def VLD2DUPq8EvenPseudo : VLDQQPseudo<IIC_VLD2dup>, Sched<[WriteVLD2]>;
		def VLD2DUPq8OddPseudo : VLDQQPseudo<IIC_VLD2dup>, Sched<[WriteVLD2]>;
		def VLD2DUPq16EvenPseudo : VLDQQPseudo<IIC_VLD2dup>, Sched<[WriteVLD2]>;
		def VLD2DUPq16OddPseudo : VLDQQPseudo<IIC_VLD2dup>, Sched<[WriteVLD2]>;
		def VLD2DUPq32EvenPseudo : VLDQQPseudo<IIC_VLD2dup>, Sched<[WriteVLD2]>;
		def VLD2DUPq32OddPseudo : VLDQQPseudo<IIC_VLD2dup>, Sched<[WriteVLD2]>;

// ...with address register writeback:		// ...with address register writeback:
multiclass VLD2DUPWB<bits<4> op7_4, string Dt, RegisterOperand VdTy,		multiclass VLD2DUPWB<bits<4> op7_4, string Dt, RegisterOperand VdTy,
Operand AddrMode> {		Operand AddrMode> {
def _fixed : NLdSt<1, 0b10, 0b1101, op7_4,		def _fixed : NLdSt<1, 0b10, 0b1101, op7_4,
(outs VdTy:$Vd, GPR:$wb),		(outs VdTy:$Vd, GPR:$wb),
(ins AddrMode:$Rn), IIC_VLD2dupu,		(ins AddrMode:$Rn), IIC_VLD2dupu,
"vld2", Dt, "$Vd, $Rn!",		"vld2", Dt, "$Vd, $Rn!",
"$Rn.addr = $wb", []>, Sched<[WriteVLD1]> {		"$Rn.addr = $wb", []>, Sched<[WriteVLD1]> {
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
def VLD3DUPd16Pseudo : VLDQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;		def VLD3DUPd16Pseudo : VLDQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;
def VLD3DUPd32Pseudo : VLDQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;		def VLD3DUPd32Pseudo : VLDQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;

// ...with double-spaced registers (not used for codegen):		// ...with double-spaced registers (not used for codegen):
def VLD3DUPq8 : VLD3DUP<{0,0,1,?}, "8">;		def VLD3DUPq8 : VLD3DUP<{0,0,1,?}, "8">;
def VLD3DUPq16 : VLD3DUP<{0,1,1,?}, "16">;		def VLD3DUPq16 : VLD3DUP<{0,1,1,?}, "16">;
def VLD3DUPq32 : VLD3DUP<{1,0,1,?}, "32">;		def VLD3DUPq32 : VLD3DUP<{1,0,1,?}, "32">;

		def VLD3DUPq8EvenPseudo : VLDQQQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;
		def VLD3DUPq8OddPseudo : VLDQQQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;
		def VLD3DUPq16EvenPseudo : VLDQQQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;
		def VLD3DUPq16OddPseudo : VLDQQQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;
		def VLD3DUPq32EvenPseudo : VLDQQQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;
		def VLD3DUPq32OddPseudo : VLDQQQQPseudo<IIC_VLD3dup>, Sched<[WriteVLD2]>;

// ...with address register writeback:		// ...with address register writeback:
class VLD3DUPWB<bits<4> op7_4, string Dt, Operand AddrMode>		class VLD3DUPWB<bits<4> op7_4, string Dt, Operand AddrMode>
: NLdSt<1, 0b10, 0b1110, op7_4, (outs DPR:$Vd, DPR:$dst2, DPR:$dst3, GPR:$wb),		: NLdSt<1, 0b10, 0b1110, op7_4, (outs DPR:$Vd, DPR:$dst2, DPR:$dst3, GPR:$wb),
(ins AddrMode:$Rn, am6offset:$Rm), IIC_VLD3dupu,		(ins AddrMode:$Rn, am6offset:$Rm), IIC_VLD3dupu,
"vld3", Dt, "\\{$Vd[], $dst2[], $dst3[]\\}, $Rn$Rm",		"vld3", Dt, "\\{$Vd[], $dst2[], $dst3[]\\}, $Rn$Rm",
"$Rn.addr = $wb", []>, Sched<[WriteVLD2]> {		"$Rn.addr = $wb", []>, Sched<[WriteVLD2]> {
let Inst{4} = 0;		let Inst{4} = 0;
let DecoderMethod = "DecodeVLD3DupInstruction";		let DecoderMethod = "DecodeVLD3DupInstruction";
Show All 30 Lines
def VLD4DUPd16Pseudo : VLDQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;		def VLD4DUPd16Pseudo : VLDQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;
def VLD4DUPd32Pseudo : VLDQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;		def VLD4DUPd32Pseudo : VLDQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;

// ...with double-spaced registers (not used for codegen):		// ...with double-spaced registers (not used for codegen):
def VLD4DUPq8 : VLD4DUP<{0,0,1,?}, "8">;		def VLD4DUPq8 : VLD4DUP<{0,0,1,?}, "8">;
def VLD4DUPq16 : VLD4DUP<{0,1,1,?}, "16">;		def VLD4DUPq16 : VLD4DUP<{0,1,1,?}, "16">;
def VLD4DUPq32 : VLD4DUP<{1,?,1,?}, "32"> { let Inst{6} = Rn{5}; }		def VLD4DUPq32 : VLD4DUP<{1,?,1,?}, "32"> { let Inst{6} = Rn{5}; }

		def VLD4DUPq8EvenPseudo : VLDQQQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;
		def VLD4DUPq8OddPseudo : VLDQQQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;
		def VLD4DUPq16EvenPseudo : VLDQQQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;
		def VLD4DUPq16OddPseudo : VLDQQQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;
		def VLD4DUPq32EvenPseudo : VLDQQQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;
		def VLD4DUPq32OddPseudo : VLDQQQQPseudo<IIC_VLD4dup>, Sched<[WriteVLD2]>;

// ...with address register writeback:		// ...with address register writeback:
class VLD4DUPWB<bits<4> op7_4, string Dt>		class VLD4DUPWB<bits<4> op7_4, string Dt>
: NLdSt<1, 0b10, 0b1111, op7_4,		: NLdSt<1, 0b10, 0b1111, op7_4,
(outs DPR:$Vd, DPR:$dst2, DPR:$dst3, DPR:$dst4, GPR:$wb),		(outs DPR:$Vd, DPR:$dst2, DPR:$dst3, DPR:$dst4, GPR:$wb),
(ins addrmode6dup:$Rn, am6offset:$Rm), IIC_VLD4dupu,		(ins addrmode6dup:$Rn, am6offset:$Rm), IIC_VLD4dupu,
"vld4", Dt, "\\{$Vd[], $dst2[], $dst3[], $dst4[]\\}, $Rn$Rm",		"vld4", Dt, "\\{$Vd[], $dst2[], $dst3[], $dst4[]\\}, $Rn$Rm",
"$Rn.addr = $wb", []>, Sched<[WriteVLD2]> {		"$Rn.addr = $wb", []>, Sched<[WriteVLD2]> {
let Inst{4} = Rn{4};		let Inst{4} = Rn{4};
▲ Show 20 Lines • Show All 6,890 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/arm-vlddup.ll

				; RUN: llc < %s -mtriple=armv8-linux-gnueabi -verify-machineinstrs \
				; RUN: -asm-verbose=false \| FileCheck %s

				%struct.uint16x4x2_t = type { <4 x i16>, <4 x i16> }
				%struct.uint16x4x3_t = type { <4 x i16>, <4 x i16>, <4 x i16> }
				%struct.uint16x4x4_t = type { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }

				%struct.uint32x2x2_t = type { <2 x i32>, <2 x i32> }
				%struct.uint32x2x3_t = type { <2 x i32>, <2 x i32>, <2 x i32> }
				%struct.uint32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }

				%struct.uint64x1x2_t = type { <1 x i64>, <1 x i64> }
				%struct.uint64x1x3_t = type { <1 x i64>, <1 x i64>, <1 x i64> }
				%struct.uint64x1x4_t = type { <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64> }

				%struct.uint8x8x2_t = type { <8 x i8>, <8 x i8> }
				%struct.uint8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }
				%struct.uint8x8x4_t = type { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> }

				%struct.uint16x8x2_t = type { <8 x i16>, <8 x i16> }
				%struct.uint16x8x3_t = type { <8 x i16>, <8 x i16>, <8 x i16> }
				%struct.uint16x8x4_t = type { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }

				%struct.uint32x4x2_t = type { <4 x i32>, <4 x i32> }
				%struct.uint32x4x3_t = type { <4 x i32>, <4 x i32>, <4 x i32> }
				%struct.uint32x4x4_t = type { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> }

				%struct.uint8x16x2_t = type { <16 x i8>, <16 x i8> }
				%struct.uint8x16x3_t = type { <16 x i8>, <16 x i8>, <16 x i8> }
				%struct.uint8x16x4_t = type { <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8> }

				declare %struct.uint8x8x2_t @llvm.arm.neon.vld2dup.v8i8.p0i8(i8*, i32)
				declare %struct.uint16x4x2_t @llvm.arm.neon.vld2dup.v4i16.p0i8(i8*, i32)
				declare %struct.uint32x2x2_t @llvm.arm.neon.vld2dup.v2i32.p0i8(i8*, i32)
				declare %struct.uint64x1x2_t @llvm.arm.neon.vld2dup.v1i64.p0i8(i8*, i32)

				declare %struct.uint8x8x3_t @llvm.arm.neon.vld3dup.v8i8.p0i8(i8*, i32)
				declare %struct.uint16x4x3_t @llvm.arm.neon.vld3dup.v4i16.p0i8(i8*, i32)
				declare %struct.uint32x2x3_t @llvm.arm.neon.vld3dup.v2i32.p0i8(i8*, i32)
				declare %struct.uint64x1x3_t @llvm.arm.neon.vld3dup.v1i64.p0i8(i8*, i32)

				declare %struct.uint8x8x4_t @llvm.arm.neon.vld4dup.v8i8.p0i8(i8*, i32)
				declare %struct.uint16x4x4_t @llvm.arm.neon.vld4dup.v4i16.p0i8(i8*, i32)
				declare %struct.uint32x2x4_t @llvm.arm.neon.vld4dup.v2i32.p0i8(i8*, i32)
				declare %struct.uint64x1x4_t @llvm.arm.neon.vld4dup.v1i64.p0i8(i8*, i32)

				declare %struct.uint8x16x2_t @llvm.arm.neon.vld2dup.v16i8.p0i8(i8*, i32)
				declare %struct.uint16x8x2_t @llvm.arm.neon.vld2dup.v8i16.p0i8(i8*, i32)
				declare %struct.uint32x4x2_t @llvm.arm.neon.vld2dup.v4i32.p0i8(i8*, i32)

				declare %struct.uint8x16x3_t @llvm.arm.neon.vld3dup.v16i8.p0i8(i8*, i32)
				declare %struct.uint16x8x3_t @llvm.arm.neon.vld3dup.v8i16.p0i8(i8*, i32)
				declare %struct.uint32x4x3_t @llvm.arm.neon.vld3dup.v4i32.p0i8(i8*, i32)

				declare %struct.uint8x16x4_t @llvm.arm.neon.vld4dup.v16i8.p0i8(i8*, i32)
				declare %struct.uint16x8x4_t @llvm.arm.neon.vld4dup.v8i16.p0i8(i8*, i32)
				declare %struct.uint32x4x4_t @llvm.arm.neon.vld4dup.v4i32.p0i8(i8*, i32)

				; CHECK-LABEL: test_vld2_dup_u16
				; CHECK: vld2.16 {d16[], d17[]}, [r0]
				define %struct.uint16x4x2_t @test_vld2_dup_u16(i8* %src) {
				entry:
				%tmp = tail call %struct.uint16x4x2_t @llvm.arm.neon.vld2dup.v4i16.p0i8(i8* %src, i32 2)
				ret %struct.uint16x4x2_t %tmp
				}

				; CHECK-LABEL: test_vld2_dup_u32
				; CHECK: vld2.32 {d16[], d17[]}, [r0]
				define %struct.uint32x2x2_t @test_vld2_dup_u32(i8* %src) {
				entry:
				%tmp = tail call %struct.uint32x2x2_t @llvm.arm.neon.vld2dup.v2i32.p0i8(i8* %src, i32 4)
				ret %struct.uint32x2x2_t %tmp
				}

				; CHECK-LABEL: test_vld2_dup_u64
				; CHECK: vld1.64 {d16, d17}, [r0:64]
				define %struct.uint64x1x2_t @test_vld2_dup_u64(i8* %src) {
				entry:
				%tmp = tail call %struct.uint64x1x2_t @llvm.arm.neon.vld2dup.v1i64.p0i8(i8* %src, i32 8)
				ret %struct.uint64x1x2_t %tmp
				}

				; CHECK-LABEL: test_vld2_dup_u8
				; CHECK: vld2.8 {d16[], d17[]}, [r0]
				define %struct.uint8x8x2_t @test_vld2_dup_u8(i8* %src) {
				entry:
				%tmp = tail call %struct.uint8x8x2_t @llvm.arm.neon.vld2dup.v8i8.p0i8(i8* %src, i32 1)
				ret %struct.uint8x8x2_t %tmp
				}

				; CHECK-LABEL: test_vld3_dup_u16
				; CHECK: vld3.16 {d16[], d17[], d18[]}, [r1]
				define %struct.uint16x4x3_t @test_vld3_dup_u16(i8* %src) {
				entry:
				%tmp = tail call %struct.uint16x4x3_t @llvm.arm.neon.vld3dup.v4i16.p0i8(i8* %src, i32 2)
				ret %struct.uint16x4x3_t %tmp
				}

				; CHECK-LABEL: test_vld3_dup_u32
				; CHECK: vld3.32 {d16[], d17[], d18[]}, [r1]
				define %struct.uint32x2x3_t @test_vld3_dup_u32(i8* %src) {
				entry:
				%tmp = tail call %struct.uint32x2x3_t @llvm.arm.neon.vld3dup.v2i32.p0i8(i8* %src, i32 4)
				ret %struct.uint32x2x3_t %tmp
				}

				; CHECK-LABEL: test_vld3_dup_u64
				; CHECK: vld1.64 {d16, d17, d18}, [r1]
				define %struct.uint64x1x3_t @test_vld3_dup_u64(i8* %src) {
				entry:
				%tmp = tail call %struct.uint64x1x3_t @llvm.arm.neon.vld3dup.v1i64.p0i8(i8* %src, i32 8)
				ret %struct.uint64x1x3_t %tmp
				}

				; CHECK-LABEL: test_vld3_dup_u8
				; CHECK: vld3.8 {d16[], d17[], d18[]}, [r1]
				define %struct.uint8x8x3_t @test_vld3_dup_u8(i8* %src) {
				entry:
				%tmp = tail call %struct.uint8x8x3_t @llvm.arm.neon.vld3dup.v8i8.p0i8(i8* %src, i32 1)
				ret %struct.uint8x8x3_t %tmp
				}

				; CHECK-LABEL: test_vld4_dup_u16
				; CHECK: vld4.16 {d16[], d17[], d18[], d19[]}, [r1]
				define %struct.uint16x4x4_t @test_vld4_dup_u16(i8* %src) {
				entry:
				%tmp = tail call %struct.uint16x4x4_t @llvm.arm.neon.vld4dup.v4i16.p0i8(i8* %src, i32 2)
				ret %struct.uint16x4x4_t %tmp
				}

				; CHECK-LABEL: test_vld4_dup_u32
				; CHECK: vld4.32 {d16[], d17[], d18[], d19[]}, [r1]
				define %struct.uint32x2x4_t @test_vld4_dup_u32(i8* %src) {
				entry:
				%tmp = tail call %struct.uint32x2x4_t @llvm.arm.neon.vld4dup.v2i32.p0i8(i8* %src, i32 4)
				ret %struct.uint32x2x4_t %tmp
				}

				; CHECK-LABEL: test_vld4_dup_u64
				; CHECK: vld1.64 {d16, d17, d18, d19}, [r1:64]
				define %struct.uint64x1x4_t @test_vld4_dup_u64(i8* %src) {
				entry:
				%tmp = tail call %struct.uint64x1x4_t @llvm.arm.neon.vld4dup.v1i64.p0i8(i8* %src, i32 8)
				ret %struct.uint64x1x4_t %tmp
				}

				; CHECK-LABEL: test_vld4_dup_u8
				; CHECK: vld4.8 {d16[], d17[], d18[], d19[]}, [r1]
				define %struct.uint8x8x4_t @test_vld4_dup_u8(i8* %src) {
				entry:
				%tmp = tail call %struct.uint8x8x4_t @llvm.arm.neon.vld4dup.v8i8.p0i8(i8* %src, i32 1)
				ret %struct.uint8x8x4_t %tmp
				}

				; CHECK-LABEL: test_vld2q_dup_u16
				; CHECK: vld2.16 {d16[], d18[]}, [r1]
				; CHECK: vld2.16 {d17[], d19[]}, [r1]
				define %struct.uint16x8x2_t @test_vld2q_dup_u16(i8* %src) {
				entry:
				%tmp = tail call %struct.uint16x8x2_t @llvm.arm.neon.vld2dup.v8i16.p0i8(i8* %src, i32 2)
				ret %struct.uint16x8x2_t %tmp
				}

				; CHECK-LABEL: test_vld2q_dup_u32
				; CHECK: vld2.32 {d16[], d18[]}, [r1]
				; CHECK: vld2.32 {d17[], d19[]}, [r1]
				define %struct.uint32x4x2_t @test_vld2q_dup_u32(i8* %src) {
				entry:
				%tmp = tail call %struct.uint32x4x2_t @llvm.arm.neon.vld2dup.v4i32.p0i8(i8* %src, i32 4)
				ret %struct.uint32x4x2_t %tmp
				}

				; CHECK-LABEL: test_vld2q_dup_u8
				; CHECK: vld2.8 {d16[], d18[]}, [r1]
				; CHECK: vld2.8 {d17[], d19[]}, [r1]
				define %struct.uint8x16x2_t @test_vld2q_dup_u8(i8* %src) {
				entry:
				%tmp = tail call %struct.uint8x16x2_t @llvm.arm.neon.vld2dup.v16i8.p0i8(i8* %src, i32 1)
				ret %struct.uint8x16x2_t %tmp
				}

				; CHECK-LABEL: test_vld3q_dup_u16
				; CHECK: vld3.16 {d16[], d18[], d20[]}, [r1]
				; CHECK: vld3.16 {d17[], d19[], d21[]}, [r1]
				define %struct.uint16x8x3_t @test_vld3q_dup_u16(i8* %src) {
				entry:
				%tmp = tail call %struct.uint16x8x3_t @llvm.arm.neon.vld3dup.v8i16.p0i8(i8* %src, i32 2)
				ret %struct.uint16x8x3_t %tmp
				}

				; CHECK-LABEL: test_vld3q_dup_u32
				; CHECK: vld3.32 {d16[], d18[], d20[]}, [r1]
				; CHECK: vld3.32 {d17[], d19[], d21[]}, [r1]
				define %struct.uint32x4x3_t @test_vld3q_dup_u32(i8* %src) {
				entry:
				%tmp = tail call %struct.uint32x4x3_t @llvm.arm.neon.vld3dup.v4i32.p0i8(i8* %src, i32 4)
				ret %struct.uint32x4x3_t %tmp
				}

				; CHECK-LABEL: test_vld3q_dup_u8
				; CHECK: vld3.8 {d16[], d18[], d20[]}, [r1]
				; CHECK: vld3.8 {d17[], d19[], d21[]}, [r1]
				define %struct.uint8x16x3_t @test_vld3q_dup_u8(i8* %src) {
				entry:
				%tmp = tail call %struct.uint8x16x3_t @llvm.arm.neon.vld3dup.v16i8.p0i8(i8* %src, i32 1)
				ret %struct.uint8x16x3_t %tmp
				}

				; CHECK-LABEL: test_vld4q_dup_u16
				; CHECK: vld4.16 {d16[], d18[], d20[], d22[]}, [r1]
				; CHECK: vld4.16 {d17[], d19[], d21[], d23[]}, [r1]
				define %struct.uint16x8x4_t @test_vld4q_dup_u16(i8* %src) {
				entry:
				%tmp = tail call %struct.uint16x8x4_t @llvm.arm.neon.vld4dup.v8i16.p0i8(i8* %src, i32 2)
				ret %struct.uint16x8x4_t %tmp
				}

				; CHECK-LABEL: test_vld4q_dup_u32
				; CHECK: vld4.32 {d16[], d18[], d20[], d22[]}, [r1]
				; CHECK: vld4.32 {d17[], d19[], d21[], d23[]}, [r1]
				define %struct.uint32x4x4_t @test_vld4q_dup_u32(i8* %src) {
				entry:
				%tmp = tail call %struct.uint32x4x4_t @llvm.arm.neon.vld4dup.v4i32.p0i8(i8* %src, i32 4)
				ret %struct.uint32x4x4_t %tmp
				}

				; CHECK-LABEL: test_vld4q_dup_u8
				; CHECK: vld4.8 {d16[], d18[], d20[], d22[]}, [r1]
				; CHECK: vld4.8 {d17[], d19[], d21[], d23[]}, [r1]
				define %struct.uint8x16x4_t @test_vld4q_dup_u8(i8* %src) {
				entry:
				%tmp = tail call %struct.uint8x16x4_t @llvm.arm.neon.vld4dup.v16i8.p0i8(i8* %src, i32 1)
				ret %struct.uint8x16x4_t %tmp
				}