This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
arm_neon.td
-
lib/CodeGen/
-
CodeGen/
3/4
CGBuiltin.cpp
-
test/
-
CodeGen/
2/4
aarch64-bf16-ldst-intrinsics.c
-
Sema/
-
aarch64-bf16-ldst-intrinsics.c
-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelDAGToDAG.cpp
-
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
3/11
aarch64-bf16-ldst-intrinsics.ll

Differential D80716

[AArch64]: BFloat Load/Store Intrinsics&CodeGen
ClosedPublic

Authored by LukeGeeson on May 28 2020, 5:56 AM.

Download Raw Diff

Details

Reviewers

fpetrogalli
SjoerdMeijer
sdesmalen
t.p.northover
stuij

Commits

rG508a4764c0ed: [AArch64]: BFloat Load/Store Intrinsics&CodeGen

Summary

This patch upstreams support for ld / st variants of BFloat intrinsics
in from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

Luke Geeson
Momchil Velikov
Luke Cheeseman

Diff Detail

Event Timeline

LukeGeeson created this revision.May 28 2020, 5:56 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 28 2020, 5:56 AM

Herald added subscribers: llvm-commits, cfe-commits, danielkiss and 2 others. · View Herald Transcript

LukeGeeson added a parent revision: D79869: [clang][BFloat] Add reinterpret cast intrinsics.May 28 2020, 5:56 AM

LukeGeeson added reviewers: sdesmalen, t.p.northover.May 28 2020, 6:03 AM

labrinea added a subscriber: labrinea.May 28 2020, 6:20 AM

labrinea added inline comments.

clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c
37	CHECK-NEXT or CHECK-DAG are preferable for sequences.
181	where are the check lines?

We need testing for the backend code.

This revision now requires changes to proceed.May 28 2020, 6:42 AM

simon_tatham added a subscriber: simon_tatham.May 28 2020, 6:51 AM

simon_tatham added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
10368	What effect is this change of strategy having on the alignment computation, for the already-supported instances of this builtin? It looks to me as if `__builtin_neon_vld1_v` with (say) a `uint8_t *` pointer argument will now compute `Alignment=1` (the natural alignment of the pointee type), whereas it would previously have computed `Alignment=8` (the size of the whole vector being loaded or stored). Is that intentional? Or accidental? Or have I completely misunderstood what's going on? (Whichever of the three, some discussion in the commit message would be a good idea, explaining why this change does or does not make a difference, as appropriate.)

Harbormaster failed remote builds in B58207: Diff 266829!May 28 2020, 7:02 AM

LukeGeeson added a child revision: D80752: [AArch64]: BFloat MatMul Intrinsics&CodeGen.May 28 2020, 12:18 PM

LukeGeeson updated this revision to Diff 267896.Jun 2 2020, 8:30 AM

LukeGeeson marked 4 inline comments as done.

LukeGeeson added a subscriber: pratlucas.

LukeGeeson added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
10368	Clang was incorrectly assuming that all the pointers from which loads were being generated for vld1 intrinsics were aligned according to according to the intrinsics result type. This causes alignment faults on the code generated by the backend. This fixes the issue so that alignment is based on the type of the pointer provided to as input to the intrinsic. @pratlucas has done some work on this in parallel https://reviews.llvm.org/D79721 which has been approved and may overrule this particular line of code. I shall add a note to the commit message, and tentatively mark this as fixed, given it's liable to adopt the work of Lucas.
clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c
37	Added to be consistent with the rest of the file (ie no CHECK-NEXT, but CHECK32/64)
181	Added to be consistent with the rest of the file

In D80716#2059977, @stuij wrote:

We need testing for the backend code.

@stuij I have added aarch64-bf16-ldst-intrinsics.ll to test the backend. Please let me know if this is ok :)

pratlucas added inline comments.Jun 2 2020, 9:38 AM

clang/lib/CodeGen/CGBuiltin.cpp
10368	Hi @LukeGeeson , Just as a heads up, some changes to this implementations were requested on D79721. The usage of `CGM.getNaturalPointeeTypeAlignment` and `IgnoreParenCasts()` was causing problems on certain argument types, so the alignment is now captured from the expression itself when it is emitted.

Accidentally added dotprod tests here rather than the child commit - just removed them

LukeGeeson marked an inline comment as done.Jun 2 2020, 9:45 AM

LukeGeeson added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
10368	Thanks @pratlucas yeah I plan to rebase my code onto upstream when you have made those changes (and fix whatever still breaks) before I push :)

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

In D80716#2073251, @LukeGeeson wrote:

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

Hi Luke,

For the backend tests it would be good if you would use CHECK-NEXT from label to ret, like I believe you did in the other patch, using -asm-verbose=0 to get rid of the cruft.

In D80716#2074883, @stuij wrote:

In D80716#2073251, @LukeGeeson wrote:

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

Hi Luke,

For the backend tests it would be good if you would use CHECK-NEXT from label to ret, like I believe you did in the other patch, using -asm-verbose=0 to get rid of the cruft.

Similar to my other comment in the [[ https://reviews.llvm.org/D80752 | other ]]patch:

This isn't how to get rid of kill statements. In particular if you pass -asm-verbose=0 to llc in the RUN statement then no CHECKs are generated, let alone kill statements.

Instead to get this desired result you run llc without that argument, and then manually remove these unnecessary kill lines. This is what I have done and this should fix this. Patch incoming

In D80716#2082356, @LukeGeeson wrote:

In D80716#2074883, @stuij wrote:

In D80716#2073251, @LukeGeeson wrote:

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

Hi Luke,

For the backend tests it would be good if you would use CHECK-NEXT from label to ret, like I believe you did in the other patch, using -asm-verbose=0 to get rid of the cruft.

Similar to my other comment in the [[ https://reviews.llvm.org/D80752 | other ]]patch:

This isn't how to get rid of kill statements. In particular if you pass -asm-verbose=0 to llc in the RUN statement then no CHECKs are generated, let alone kill statements.

Instead to get this desired result you run llc without that argument, and then manually remove these unnecessary kill lines. This is what I have done and this should fix this. Patch incoming

Further, you cannot use CHECK-NEXT if your test function contains such a kill statement, unless you manually remove it, and use CHECK in place (it fails when running FileCheck as it sees such lines in the output and hence check-next fails if it doesn't expect it). This is just something we must balance if we want clear tests and direct 1-1 correspondence with the result. I've used CHECK-NEXT where I can, but CHECK where I must

rebased my patch off of upstream llvm master
removed my code in favour of more recent code by @pratlucas
used update CC to generate tests without kill statements
used CHECK-NEXT where possible, unless kill statements have been removed, in this case CHECK is used

LukeGeeson removed a parent revision: D79869: [clang][BFloat] Add reinterpret cast intrinsics.Jun 9 2020, 9:24 AM

updated check32 -> check-next32, same for check64

miyuki added a child revision: D81486: [ARM][BFloat] Implement lowering of bf16 load/store intrinsics.Jun 9 2020, 10:38 AM

miyuki removed a child revision: D81486: [ARM][BFloat] Implement lowering of bf16 load/store intrinsics.Jun 10 2020, 3:17 AM

fixed check typo

arsenm added a subscriber: arsenm.Jun 10 2020, 6:57 AM

arsenm added inline comments.

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	Why is the IR type name bfloat and not bfloat16?

LukeGeeson marked 2 inline comments as done.Jun 10 2020, 7:08 AM

LukeGeeson added inline comments.

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	The naming for the IR type was agreed upon here after quite a big discussion. https://reviews.llvm.org/D78190

SjoerdMeijer added inline comments.Jun 10 2020, 8:09 AM

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	I regret very much that I didn't notice this earlier... I.e., I noticed this in D76077 and wrote that I am relatively unhappy about this (I think I mentioned this on another ticket too). Because like @arsenm , I would expect the IR type name to be bfloat16. Correct me if I am wrong, but I don't see a big discussion about this in D78190. I only see 1 or 2 comments about `BFloat` vs `Bfloat`.

LukeGeeson marked 2 inline comments as done.Jun 10 2020, 8:33 AM

LukeGeeson added inline comments.

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	I cannot see a discussion about the IR type name per-se but I can see you were both involved in the discussion more generally. I am concerned that this patch is the wrong place to discuss such issues, and that we should bring this up in a more appropriate place as you mention so that this patch isn't held back.

chill added a subscriber: chill.Jun 10 2020, 8:41 AM

chill added inline comments.

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	I don't see a compelling reason for the name to be `bfloat16` or `bfloat3`, etc. Like other floating-point types (`float`, `double`, and `half`), the name denotes a specific externally defined format, unlike `iN`.

SjoerdMeijer added inline comments.Jun 10 2020, 9:05 AM

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	Like other floating-point types (float, double, and half), the name denotes a specific externally defined format, Is the defined format not called bfloat16?

chill added inline comments.Jun 10 2020, 9:25 AM

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	Indeed, people use the name "bfloat16". But then the `half`, `float`, and `double` also differ from the official `binary16`, `binarty32`, and `binary64`. IMHO `bfloat` fits better in the LLVM IR naming convention.

SjoerdMeijer added inline comments.Jun 10 2020, 10:53 AM

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	yeah, so that's exactly why I don't follow your logic. If there's any logic in the names here, the mapping from source-language type to IR type seems the most plausible one. And I just don't see the benefit of dropping the 16, and how that would fit better in some naming scheme or how that makes things clearer here.

chill added inline comments.Jun 10 2020, 11:08 AM

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	What source language? That said, I'm resigning from the bikeshedding here.

chill removed a subscriber: momchil.velikov.Jun 10 2020, 11:09 AM

chill removed a subscriber: chill.

stuij added inline comments.Jun 11 2020, 4:29 AM

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265	Just as a house-keeping note: If we would change the naming, I think we can all agree that this ticket itself shouldn't be the place where we want to do this. I'm happy for the conversation to carry on here, but I think we can move the ticket forward at the same time.
917	You should be able to do without all these big blocks of attributes which I guess were generated from C -> IR conversion. Just remove it and the `#x`s after the function declarations (maybe replace them with `nounwind`).

removed unnecessary contents of test

LGTM. Thanks!

This revision is now accepted and ready to land.Jun 12 2020, 7:15 AM

Closed by commit rG508a4764c0ed: [AArch64]: BFloat Load/Store Intrinsics&CodeGen (authored by LukeGeeson). · Explain WhyJun 16 2020, 7:43 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_neon.td

34 lines

lib/

CodeGen/

CGBuiltin.cpp

12 lines

test/

CodeGen/

aarch64-bf16-ldst-intrinsics.c

415 lines

Sema/

aarch64-bf16-ldst-intrinsics.c

102 lines

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

162 lines

AArch64InstrInfo.td

30 lines

test/

CodeGen/

AArch64/

aarch64-bf16-ldst-intrinsics.ll

826 lines

Diff 267908

clang/include/clang/Basic/arm_neon.td

Show First 20 Lines • Show All 1,868 Lines • ▼ Show 20 Lines	let ArchGuard = "defined(__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)" in {
def VGET_HIGH_BF : NoTestOpInst<"vget_high", ".Q", "b", OP_HI>;		def VGET_HIGH_BF : NoTestOpInst<"vget_high", ".Q", "b", OP_HI>;
def VGET_LOW_BF : NoTestOpInst<"vget_low", ".Q", "b", OP_LO>;		def VGET_LOW_BF : NoTestOpInst<"vget_low", ".Q", "b", OP_LO>;

def VGET_LANE_BF : IInst<"vget_lane", "1.I", "bQb">;		def VGET_LANE_BF : IInst<"vget_lane", "1.I", "bQb">;
def VSET_LANE_BF : IInst<"vset_lane", ".1.I", "bQb">;		def VSET_LANE_BF : IInst<"vset_lane", ".1.I", "bQb">;

def SCALAR_VDUP_LANE_BF : IInst<"vdup_lane", "1.I", "Sb">;		def SCALAR_VDUP_LANE_BF : IInst<"vdup_lane", "1.I", "Sb">;
def SCALAR_VDUP_LANEQ_BF : IInst<"vdup_laneq", "1QI", "Sb">;		def SCALAR_VDUP_LANEQ_BF : IInst<"vdup_laneq", "1QI", "Sb">;

		def VLD1_BF : WInst<"vld1", ".(c*!)", "bQb">;
		def VLD2_BF : WInst<"vld2", "2(c*!)", "bQb">;
		def VLD3_BF : WInst<"vld3", "3(c*!)", "bQb">;
		def VLD4_BF : WInst<"vld4", "4(c*!)", "bQb">;

		def VST1_BF : WInst<"vst1", "v*(.!)", "bQb">;
		def VST2_BF : WInst<"vst2", "v*(2!)", "bQb">;
		def VST3_BF : WInst<"vst3", "v*(3!)", "bQb">;
		def VST4_BF : WInst<"vst4", "v*(4!)", "bQb">;

		def VLD1_X2_BF : WInst<"vld1_x2", "2(c*!)", "bQb">;
		def VLD1_X3_BF : WInst<"vld1_x3", "3(c*!)", "bQb">;
		def VLD1_X4_BF : WInst<"vld1_x4", "4(c*!)", "bQb">;

		def VST1_X2_BF : WInst<"vst1_x2", "v*(2!)", "bQb">;
		def VST1_X3_BF : WInst<"vst1_x3", "v*(3!)", "bQb">;
		def VST1_X4_BF : WInst<"vst1_x4", "v*(4!)", "bQb">;

		def VLD1_LANE_BF : WInst<"vld1_lane", ".(c*!).I", "bQb">;
		def VLD2_LANE_BF : WInst<"vld2_lane", "2(c*!)2I", "bQb">;
		def VLD3_LANE_BF : WInst<"vld3_lane", "3(c*!)3I", "bQb">;
		def VLD4_LANE_BF : WInst<"vld4_lane", "4(c*!)4I", "bQb">;
		def VST1_LANE_BF : WInst<"vst1_lane", "v*(.!)I", "bQb">;
		def VST2_LANE_BF : WInst<"vst2_lane", "v*(2!)I", "bQb">;
		def VST3_LANE_BF : WInst<"vst3_lane", "v*(3!)I", "bQb">;
		def VST4_LANE_BF : WInst<"vst4_lane", "v*(4!)I", "bQb">;

		def VLD1_DUP_BF : WInst<"vld1_dup", ".(c*!)", "bQb">;
		def VLD2_DUP_BF : WInst<"vld2_dup", "2(c*!)", "bQb">;
		def VLD3_DUP_BF : WInst<"vld3_dup", "3(c*!)", "bQb">;
		def VLD4_DUP_BF : WInst<"vld4_dup", "4(c*!)", "bQb">;


}		}

let ArchGuard = "defined(__ARM_FEATURE_BF16) && !defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_BF16) && !defined(__aarch64__)" in {
let BigEndianSafe = 1 in {		let BigEndianSafe = 1 in {
defm VREINTERPRET_BF : REINTERPRET_CROSS_TYPES<		defm VREINTERPRET_BF : REINTERPRET_CROSS_TYPES<
"csilUcUsUiUlhfPcPsPlQcQsQiQlQUcQUsQUiQUlQhQfQPcQPsQPl", "bQb">;		"csilUcUsUiUlhfPcPsPlQcQsQiQlQUcQUsQUiQUlQhQfQPcQPsQPl", "bQb">;
}		}
}		}

let ArchGuard = "defined(__ARM_FEATURE_BF16) && defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_BF16) && defined(__aarch64__)" in {
let BigEndianSafe = 1 in {		let BigEndianSafe = 1 in {
defm VVREINTERPRET_BF : REINTERPRET_CROSS_TYPES<		defm VVREINTERPRET_BF : REINTERPRET_CROSS_TYPES<
"csilUcUsUiUlhfdPcPsPlQcQsQiQlQUcQUsQUiQUlQhQfQdQPcQPsQPlQPk", "bQb">;		"csilUcUsUiUlhfdPcPsPlQcQsQiQlQUcQUsQUiQUlQhQfQdQPcQPsQPlQPk", "bQb">;
}		}
}		}

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,358 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vrsraq_n_v: {
TmpOps.push_back(Ops[2]);		TmpOps.push_back(Ops[2]);
Function* F = CGM.getIntrinsic(Int, Ty);		Function* F = CGM.getIntrinsic(Int, Ty);
llvm::Value *tmp = EmitNeonCall(F, TmpOps, "vrshr_n", 1, true);		llvm::Value *tmp = EmitNeonCall(F, TmpOps, "vrshr_n", 1, true);
Ops[0] = Builder.CreateBitCast(Ops[0], VTy);		Ops[0] = Builder.CreateBitCast(Ops[0], VTy);
return Builder.CreateAdd(Ops[0], tmp);		return Builder.CreateAdd(Ops[0], tmp);
}		}
case NEON::BI__builtin_neon_vld1_v:		case NEON::BI__builtin_neon_vld1_v:
case NEON::BI__builtin_neon_vld1q_v: {		case NEON::BI__builtin_neon_vld1q_v: {
		auto Alignment = CGM.getNaturalPointeeTypeAlignment(
		E->getArg(0)->IgnoreParenCasts()->getType());
		simon_tathamUnsubmitted Done Reply Inline Actions What effect is this change of strategy having on the alignment computation, for the already-supported instances of this builtin? It looks to me as if `__builtin_neon_vld1_v` with (say) a `uint8_t ` pointer argument will now compute `Alignment=1` (the natural alignment of the pointee type), whereas it would previously have computed `Alignment=8` (the size of the whole vector being loaded or stored). Is that intentional? Or accidental? Or have I completely misunderstood what's going on? (Whichever of the three, some discussion in the commit message would be a good idea, explaining why this change does or does not make a difference, as appropriate.) simon_tatham:* What effect is this change of strategy having on the alignment computation, for the already…
		LukeGeesonAuthorUnsubmitted Done Reply Inline Actions Clang was incorrectly assuming that all the pointers from which loads were being generated for vld1 intrinsics were aligned according to according to the intrinsics result type. This causes alignment faults on the code generated by the backend. This fixes the issue so that alignment is based on the type of the pointer provided to as input to the intrinsic. @pratlucas has done some work on this in parallel https://reviews.llvm.org/D79721 which has been approved and may overrule this particular line of code. I shall add a note to the commit message, and tentatively mark this as fixed, given it's liable to adopt the work of Lucas. LukeGeeson: Clang was incorrectly assuming that all the pointers from which loads were being generated for…
		pratlucasUnsubmitted Not Done Reply Inline Actions Hi @LukeGeeson , Just as a heads up, some changes to this implementations were requested on D79721. The usage of `CGM.getNaturalPointeeTypeAlignment` and `IgnoreParenCasts()` was causing problems on certain argument types, so the alignment is now captured from the expression itself when it is emitted. pratlucas: Hi @LukeGeeson , Just as a heads up, some changes to this implementations were requested on…
		LukeGeesonAuthorUnsubmitted Done Reply Inline Actions Thanks @pratlucas yeah I plan to rebase my code onto upstream when you have made those changes (and fix whatever still breaks) before I push :) LukeGeeson: Thanks @pratlucas yeah I plan to rebase my code onto upstream when you have made those changes…
Ops[0] = Builder.CreateBitCast(Ops[0], llvm::PointerType::getUnqual(VTy));		Ops[0] = Builder.CreateBitCast(Ops[0], llvm::PointerType::getUnqual(VTy));
auto Alignment = CharUnits::fromQuantity(
BuiltinID == NEON::BI__builtin_neon_vld1_v ? 8 : 16);
return Builder.CreateAlignedLoad(VTy, Ops[0], Alignment);		return Builder.CreateAlignedLoad(VTy, Ops[0], Alignment);
}		}
case NEON::BI__builtin_neon_vst1_v:		case NEON::BI__builtin_neon_vst1_v:
case NEON::BI__builtin_neon_vst1q_v:		case NEON::BI__builtin_neon_vst1q_v:
Ops[0] = Builder.CreateBitCast(Ops[0], llvm::PointerType::getUnqual(VTy));		Ops[0] = Builder.CreateBitCast(Ops[0], llvm::PointerType::getUnqual(VTy));
Ops[1] = Builder.CreateBitCast(Ops[1], VTy);		Ops[1] = Builder.CreateBitCast(Ops[1], VTy);
return Builder.CreateDefaultAlignedStore(Ops[1], Ops[0]);		return Builder.CreateDefaultAlignedStore(Ops[1], Ops[0]);
case NEON::BI__builtin_neon_vld1_lane_v:		case NEON::BI__builtin_neon_vld1_lane_v:
case NEON::BI__builtin_neon_vld1q_lane_v: {		case NEON::BI__builtin_neon_vld1q_lane_v: {
Ops[1] = Builder.CreateBitCast(Ops[1], Ty);		Ops[1] = Builder.CreateBitCast(Ops[1], Ty);
Ty = llvm::PointerType::getUnqual(VTy->getElementType());		Ty = llvm::PointerType::getUnqual(VTy->getElementType());
Ops[0] = Builder.CreateBitCast(Ops[0], Ty);		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
auto Alignment = CharUnits::fromQuantity(		auto Alignment = CGM.getNaturalPointeeTypeAlignment(
BuiltinID == NEON::BI__builtin_neon_vld1_lane_v ? 8 : 16);		E->getArg(0)->IgnoreParenCasts()->getType());
Ops[0] =		Ops[0] =
Builder.CreateAlignedLoad(VTy->getElementType(), Ops[0], Alignment);		Builder.CreateAlignedLoad(VTy->getElementType(), Ops[0], Alignment);
return Builder.CreateInsertElement(Ops[1], Ops[0], Ops[2], "vld1_lane");		return Builder.CreateInsertElement(Ops[1], Ops[0], Ops[2], "vld1_lane");
}		}
case NEON::BI__builtin_neon_vld1_dup_v:		case NEON::BI__builtin_neon_vld1_dup_v:
case NEON::BI__builtin_neon_vld1q_dup_v: {		case NEON::BI__builtin_neon_vld1q_dup_v: {
Value *V = UndefValue::get(Ty);		Value *V = UndefValue::get(Ty);
Ty = llvm::PointerType::getUnqual(VTy->getElementType());		Ty = llvm::PointerType::getUnqual(VTy->getElementType());
Ops[0] = Builder.CreateBitCast(Ops[0], Ty);		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
auto Alignment = CharUnits::fromQuantity(		auto Alignment = CGM.getNaturalPointeeTypeAlignment(
BuiltinID == NEON::BI__builtin_neon_vld1_dup_v ? 8 : 16);		E->getArg(0)->IgnoreParenCasts()->getType());
Ops[0] =		Ops[0] =
Builder.CreateAlignedLoad(VTy->getElementType(), Ops[0], Alignment);		Builder.CreateAlignedLoad(VTy->getElementType(), Ops[0], Alignment);
llvm::Constant *CI = ConstantInt::get(Int32Ty, 0);		llvm::Constant *CI = ConstantInt::get(Int32Ty, 0);
Ops[0] = Builder.CreateInsertElement(V, Ops[0], CI);		Ops[0] = Builder.CreateInsertElement(V, Ops[0], CI);
return EmitNeonSplat(Ops[0], CI);		return EmitNeonSplat(Ops[0], CI);
}		}
case NEON::BI__builtin_neon_vst1_lane_v:		case NEON::BI__builtin_neon_vst1_lane_v:
case NEON::BI__builtin_neon_vst1q_lane_v:		case NEON::BI__builtin_neon_vst1q_lane_v:
▲ Show 20 Lines • Show All 6,046 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c

This file was added.

				// RUN: %clang_cc1 -triple aarch64-arm-none-eabi -target-feature +neon -target-feature +bf16 \
				// RUN: -O2 -emit-llvm %s -o - \| FileCheck %s --check-prefixes=CHECK,CHECK64
				// RUN: %clang_cc1 -triple armv8.6a-arm-none-eabi -target-feature +neon -target-feature +bf16 -mfloat-abi hard \
				// RUN: -O2 -emit-llvm %s -o - \| FileCheck %s --check-prefixes=CHECK,CHECK32

				#include "arm_neon.h"

				bfloat16x4_t test_vld1_bf16(bfloat16_t const *ptr) {
				return vld1_bf16(ptr);
				}
				// CHECK-LABEL: test_vld1_bf16
				// CHECK64: %1 = load <4 x bfloat>, <4 x bfloat>* %0
				// CHECK64: ret <4 x bfloat> %1
				// CHECK32: = load <4 x bfloat>, <4 x bfloat>* %0, align 2
				// CHECK32: ret <4 x bfloat> %1

				bfloat16x8_t test_vld1q_bf16(bfloat16_t const *ptr) {
				return vld1q_bf16(ptr);
				}
				// CHECK-LABEL: test_vld1q_bf16
				// CHECK64: %1 = load <8 x bfloat>, <8 x bfloat>* %0
				// CHECK64: ret <8 x bfloat> %1
				// CHECK32: %1 = load <8 x bfloat>, <8 x bfloat>* %0, align 2
				// CHECK32: ret <8 x bfloat> %1

				bfloat16x4_t test_vld1_lane_bf16(bfloat16_t const *ptr, bfloat16x4_t src) {
				return vld1_lane_bf16(ptr, src, 0);
				}
				// CHECK-LABEL: test_vld1_lane_bf16
				// CHECK64: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK64: %vld1_lane = insertelement <4 x bfloat> %src, bfloat %0, i32 0
				// CHECK64: ret <4 x bfloat> %vld1_lane
				// CHECK32: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK32: %vld1_lane = insertelement <4 x bfloat> %src, bfloat %0, i32 0
				// CHECK32: ret <4 x bfloat> %vld1_lane

				bfloat16x8_t test_vld1q_lane_bf16(bfloat16_t const *ptr, bfloat16x8_t src) {
				labrineaUnsubmitted Not Done Reply Inline Actions CHECK-NEXT or CHECK-DAG are preferable for sequences. labrinea: CHECK-NEXT or CHECK-DAG are preferable for sequences.
				LukeGeesonAuthorUnsubmitted Done Reply Inline Actions Added to be consistent with the rest of the file (ie no CHECK-NEXT, but CHECK32/64) LukeGeeson: Added to be consistent with the rest of the file (ie no CHECK-NEXT, but CHECK32/64)
				return vld1q_lane_bf16(ptr, src, 7);
				}
				// CHECK-LABEL: test_vld1q_lane_bf16
				// CHECK64: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK64: %vld1_lane = insertelement <8 x bfloat> %src, bfloat %0, i32 7
				// CHECK64: ret <8 x bfloat> %vld1_lane
				// CHECK32: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK32: %vld1_lane = insertelement <8 x bfloat> %src, bfloat %0, i32 7
				// CHECK32: ret <8 x bfloat> %vld1_lane

				bfloat16x4_t test_vld1_dup_bf16(bfloat16_t const *ptr) {
				return vld1_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld1_dup_bf16
				// CHECK64: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK64: %1 = insertelement <4 x bfloat> undef, bfloat %0, i32 0
				// CHECK64: %lane = shufflevector <4 x bfloat> %1, <4 x bfloat> undef, <4 x i32> zeroinitializer
				// CHECK64: ret <4 x bfloat> %lane
				// CHECK32: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK32: %1 = insertelement <4 x bfloat> undef, bfloat %0, i32 0
				// CHECK32: %lane = shufflevector <4 x bfloat> %1, <4 x bfloat> undef, <4 x i32> zeroinitializer
				// CHECK32: ret <4 x bfloat> %lane

				bfloat16x4x2_t test_vld1_bf16_x2(bfloat16_t const *ptr) {
				return vld1_bf16_x2(ptr);
				}
				// CHECK-LABEL: test_vld1_bf16_x2
				// CHECK64: %vld1xN = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x2.v4bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld1xN = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld1x2.v4bf16.p0bf16(bfloat* %ptr)

				bfloat16x8x2_t test_vld1q_bf16_x2(bfloat16_t const *ptr) {
				return vld1q_bf16_x2(ptr);
				}
				// CHECK-LABEL: test_vld1q_bf16_x2
				// CHECK64: %vld1xN = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x2.v8bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld1xN = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld1x2.v8bf16.p0bf16(bfloat* %ptr)

				bfloat16x4x3_t test_vld1_bf16_x3(bfloat16_t const *ptr) {
				return vld1_bf16_x3(ptr);
				}
				// CHECK-LABEL: test_vld1_bf16_x3
				// CHECK64: %vld1xN = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x3.v4bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld1xN = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld1x3.v4bf16.p0bf16(bfloat* %ptr)

				bfloat16x8x3_t test_vld1q_bf16_x3(bfloat16_t const *ptr) {
				return vld1q_bf16_x3(ptr);
				}
				// CHECK-LABEL: test_vld1q_bf16_x3
				// CHECK64: %vld1xN = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x3.v8bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld1xN = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld1x3.v8bf16.p0bf16(bfloat* %ptr)

				bfloat16x4x4_t test_vld1_bf16_x4(bfloat16_t const *ptr) {
				return vld1_bf16_x4(ptr);
				}
				// CHECK-LABEL: test_vld1_bf16_x4
				// CHECK64: %vld1xN = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x4.v4bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld1xN = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld1x4.v4bf16.p0bf16(bfloat* %ptr)

				bfloat16x8x4_t test_vld1q_bf16_x4(bfloat16_t const *ptr) {
				return vld1q_bf16_x4(ptr);
				}
				// CHECK-LABEL: test_vld1q_bf16_x4
				// CHECK64: %vld1xN = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x4.v8bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld1xN = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld1x4.v8bf16.p0bf16(bfloat* %ptr)

				bfloat16x8_t test_vld1q_dup_bf16(bfloat16_t const *ptr) {
				return vld1q_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld1q_dup_bf16
				// CHECK64: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK64: %1 = insertelement <8 x bfloat> undef, bfloat %0, i32 0
				// CHECK64: %lane = shufflevector <8 x bfloat> %1, <8 x bfloat> undef, <8 x i32> zeroinitializer
				// CHECK64: ret <8 x bfloat> %lane
				// CHECK32: %0 = load bfloat, bfloat* %ptr, align 2
				// CHECK32: %1 = insertelement <8 x bfloat> undef, bfloat %0, i32 0
				// CHECK32: %lane = shufflevector <8 x bfloat> %1, <8 x bfloat> undef, <8 x i32> zeroinitializer
				// CHECK32: ret <8 x bfloat> %lane

				bfloat16x4x2_t test_vld2_bf16(bfloat16_t const *ptr) {
				return vld2_bf16(ptr);
				}
				// CHECK-LABEL: test_vld2_bf16
				// CHECK64: %0 = bitcast bfloat* %ptr to <4 x bfloat>*
				// CHECK64: %vld2 = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2.v4bf16.p0v4bf16(<4 x bfloat>* %0)
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld2_v = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld2.v4bf16.p0i8(i8* %0, i32 2)

				bfloat16x8x2_t test_vld2q_bf16(bfloat16_t const *ptr) {
				return vld2q_bf16(ptr);
				}
				// CHECK-LABEL: test_vld2q_bf16
				// CHECK64: %0 = bitcast bfloat* %ptr to <8 x bfloat>*
				// CHECK64: %vld2 = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2.v8bf16.p0v8bf16(<8 x bfloat>* %0)
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld2q_v = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld2.v8bf16.p0i8(i8* %0, i32 2)

				bfloat16x4x2_t test_vld2_lane_bf16(bfloat16_t const *ptr, bfloat16x4x2_t src) {
				return vld2_lane_bf16(ptr, src, 1);
				}
				// CHECK-LABEL: test_vld2_lane_bf16
				// CHECK64: %vld2_lane = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2lane.v4bf16.p0i8(<4 x bfloat> %src.coerce.fca.0.extract, <4 x bfloat> %src.coerce.fca.1.extract, i64 1, i8* %0)
				// CHECK32: %vld2_lane_v = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld2lane.v4bf16.p0i8(i8* %2, <4 x bfloat> %0, <4 x bfloat> %1, i32 1, i32 2)

				bfloat16x8x2_t test_vld2q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x2_t src) {
				return vld2q_lane_bf16(ptr, src, 7);
				}
				// CHECK-LABEL: test_vld2q_lane_bf16
				// CHECK64: %vld2_lane = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2lane.v8bf16.p0i8(<8 x bfloat> %src.coerce.fca.0.extract, <8 x bfloat> %src.coerce.fca.1.extract, i64 7, i8* %0)
				// CHECK32: %vld2q_lane_v = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld2lane.v8bf16.p0i8(i8* %2, <8 x bfloat> %0, <8 x bfloat> %1, i32 7, i32 2)

				bfloat16x4x3_t test_vld3_bf16(bfloat16_t const *ptr) {
				return vld3_bf16(ptr);
				}
				// CHECK-LABEL: test_vld3_bf16
				// CHECK64: %vld3 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3.v4bf16.p0v4bf16(<4 x bfloat>* %0)
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld3_v = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld3.v4bf16.p0i8(i8* %0, i32 2)

				bfloat16x8x3_t test_vld3q_bf16(bfloat16_t const *ptr) {
				return vld3q_bf16(ptr);
				}
				// CHECK-LABEL: test_vld3q_bf16
				// CHECK64: %vld3 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3.v8bf16.p0v8bf16(<8 x bfloat>* %0)
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld3q_v = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld3.v8bf16.p0i8(i8* %0, i32 2)

				bfloat16x4x3_t test_vld3_lane_bf16(bfloat16_t const *ptr, bfloat16x4x3_t src) {
				return vld3_lane_bf16(ptr, src, 1);
				}
				// CHECK-LABEL: test_vld3_lane_bf16
				// CHECK64: %vld3_lane = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3lane.v4bf16.p0i8(<4 x bfloat> %src.coerce.fca.0.extract, <4 x bfloat> %src.coerce.fca.1.extract, <4 x bfloat> %src.coerce.fca.2.extract, i64 1, i8* %0)
				// CHECK32: %3 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld3_lane_v = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld3lane.v4bf16.p0i8(i8* %3, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2, i32 1, i32 2)

				bfloat16x8x3_t test_vld3q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x3_t src) {
				return vld3q_lane_bf16(ptr, src, 7);
				// return vld3q_lane_bf16(ptr, src, 8);
				}
				// CHECK-LABEL: test_vld3q_lane_bf16
				// CHECK64: %vld3_lane = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3lane.v8bf16.p0i8(<8 x bfloat> %src.coerce.fca.0.extract, <8 x bfloat> %src.coerce.fca.1.extract, <8 x bfloat> %src.coerce.fca.2.extract, i64 7, i8* %0)
				// CHECK32: %3 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld3q_lane_v = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld3lane.v8bf16.p0i8(i8* %3, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, i32 7, i32 2)

				bfloat16x4x4_t test_vld4_bf16(bfloat16_t const *ptr) {
				labrineaUnsubmitted Not Done Reply Inline Actions where are the check lines? labrinea: where are the check lines?
				LukeGeesonAuthorUnsubmitted Done Reply Inline Actions Added to be consistent with the rest of the file LukeGeeson: Added to be consistent with the rest of the file
				return vld4_bf16(ptr);
				}
				// CHECK-LABEL: test_vld4_bf16
				// CHECK64: %vld4 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4.v4bf16.p0v4bf16(<4 x bfloat>* %0)
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld4_v = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld4.v4bf16.p0i8(i8* %0, i32 2)

				bfloat16x8x4_t test_vld4q_bf16(bfloat16_t const *ptr) {
				return vld4q_bf16(ptr);
				}
				// CHECK-LABEL: test_vld4q_bf16
				// CHECK64: %vld4 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4.v8bf16.p0v8bf16(<8 x bfloat>* %0)
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld4q_v = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld4.v8bf16.p0i8(i8* %0, i32 2)

				bfloat16x4x4_t test_vld4_lane_bf16(bfloat16_t const *ptr, bfloat16x4x4_t src) {
				return vld4_lane_bf16(ptr, src, 1);
				}
				// CHECK-LABEL: test_vld4_lane_bf16
				// CHECK64: %vld4_lane = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4lane.v4bf16.p0i8(<4 x bfloat> %src.coerce.fca.0.extract, <4 x bfloat> %src.coerce.fca.1.extract, <4 x bfloat> %src.coerce.fca.2.extract, <4 x bfloat> %src.coerce.fca.3.extract, i64 1, i8* %0)
				// CHECK32: %4 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld4_lane_v = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld4lane.v4bf16.p0i8(i8* %4, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2, <4 x bfloat> %3, i32 1, i32 2)

				bfloat16x8x4_t test_vld4q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x4_t src) {
				return vld4q_lane_bf16(ptr, src, 7);
				}
				// CHECK-LABEL: test_vld4q_lane_bf16
				// CHECK64: %vld4_lane = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4lane.v8bf16.p0i8(<8 x bfloat> %src.coerce.fca.0.extract, <8 x bfloat> %src.coerce.fca.1.extract, <8 x bfloat> %src.coerce.fca.2.extract, <8 x bfloat> %src.coerce.fca.3.extract, i64 7, i8* %0)
				// CHECK32: %4 = bitcast bfloat* %ptr to i8*
				// CHECK32: %vld4q_lane_v = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld4lane.v8bf16.p0i8(i8* %4, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, <8 x bfloat> %3, i32 7, i32 2)

				bfloat16x4x2_t test_vld2_dup_bf16(bfloat16_t const *ptr) {
				return vld2_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld2_dup_bf16
				// CHECK64: %vld2 = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2r.v4bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld2_dup_v = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld2dup.v4bf16.p0i8(i8* %0, i32 2)

				bfloat16x8x2_t test_vld2q_dup_bf16(bfloat16_t const *ptr) {
				return vld2q_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld2q_dup_bf16
				// CHECK64: %vld2 = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2r.v8bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld2q_dup_v = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld2dup.v8bf16.p0i8(i8* %0, i32 2)

				bfloat16x4x3_t test_vld3_dup_bf16(bfloat16_t const *ptr) {
				return vld3_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld3_dup_bf16
				// CHECK64: %vld3 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3r.v4bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld3_dup_v = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld3dup.v4bf16.p0i8(i8* %0, i32 2)

				bfloat16x8x3_t test_vld3q_dup_bf16(bfloat16_t const *ptr) {
				return vld3q_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld3q_dup_bf16
				// CHECK64: %vld3 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3r.v8bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld3q_dup_v = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld3dup.v8bf16.p0i8(i8* %0, i32 2)

				bfloat16x4x4_t test_vld4_dup_bf16(bfloat16_t const *ptr) {
				return vld4_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld4_dup_bf16
				// CHECK64: %vld4 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4r.v4bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld4_dup_v = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.arm.neon.vld4dup.v4bf16.p0i8(i8* %0, i32 2)

				bfloat16x8x4_t test_vld4q_dup_bf16(bfloat16_t const *ptr) {
				return vld4q_dup_bf16(ptr);
				}
				// CHECK-LABEL: test_vld4q_dup_bf16
				// CHECK64: %vld4 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4r.v8bf16.p0bf16(bfloat* %ptr)
				// CHECK32: %vld4q_dup_v = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.arm.neon.vld4dup.v8bf16.p0i8(i8* %0, i32 2)

				void test_vst1_bf16(bfloat16_t *ptr, bfloat16x4_t val) {
				vst1_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst1_bf16
				// CHECK64: %0 = bitcast bfloat* %ptr to <4 x bfloat>*
				// CHECK64: store <4 x bfloat> %val, <4 x bfloat>* %0, align 8
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: tail call void @llvm.arm.neon.vst1.p0i8.v4bf16(i8* %0, <4 x bfloat> %val, i32 2)

				void test_vst1q_bf16(bfloat16_t *ptr, bfloat16x8_t val) {
				vst1q_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst1q_bf16
				// CHECK64: %0 = bitcast bfloat* %ptr to <8 x bfloat>*
				// CHECK64: store <8 x bfloat> %val, <8 x bfloat>* %0, align 16
				// CHECK32: %0 = bitcast bfloat* %ptr to i8*
				// CHECK32: tail call void @llvm.arm.neon.vst1.p0i8.v8bf16(i8* %0, <8 x bfloat> %val, i32 2)

				void test_vst1_lane_bf16(bfloat16_t *ptr, bfloat16x4_t val) {
				vst1_lane_bf16(ptr, val, 1);
				}
				// CHECK-LABEL: test_vst1_lane_bf16
				// CHECK64: %0 = extractelement <4 x bfloat> %val, i32 1
				// CHECK64: store bfloat %0, bfloat* %ptr, align 2
				// CHECK32: %0 = extractelement <4 x bfloat> %val, i32 1
				// CHECK32: store bfloat %0, bfloat* %ptr, align 2

				void test_vst1q_lane_bf16(bfloat16_t *ptr, bfloat16x8_t val) {
				vst1q_lane_bf16(ptr, val, 7);
				}
				// CHECK-LABEL: test_vst1q_lane_bf16
				// CHECK64: %0 = extractelement <8 x bfloat> %val, i32 7
				// CHECK64: store bfloat %0, bfloat* %ptr, align 2
				// CHECK32: %0 = extractelement <8 x bfloat> %val, i32 7
				// CHECK32: store bfloat %0, bfloat* %ptr, align 2

				void test_vst1_bf16_x2(bfloat16_t *ptr, bfloat16x4x2_t val) {
				vst1_bf16_x2(ptr, val);
				}
				// CHECK-LABEL: test_vst1_bf16_x2
				// CHECK64: tail call void @llvm.aarch64.neon.st1x2.v4bf16.p0bf16(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, bfloat* %ptr)
				// CHECK32: tail call void @llvm.arm.neon.vst1x2.p0bf16.v4bf16(bfloat* %ptr, <4 x bfloat> %0, <4 x bfloat> %1)

				void test_vst1q_bf16_x2(bfloat16_t *ptr, bfloat16x8x2_t val) {
				vst1q_bf16_x2(ptr, val);
				}
				// CHECK-LABEL: test_vst1q_bf16_x2
				// CHECK64: tail call void @llvm.aarch64.neon.st1x2.v8bf16.p0bf16(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, bfloat* %ptr)
				// CHECK32: tail call void @llvm.arm.neon.vst1x2.p0bf16.v8bf16(bfloat* %ptr, <8 x bfloat> %0, <8 x bfloat> %1)

				void test_vst1_bf16_x3(bfloat16_t *ptr, bfloat16x4x3_t val) {
				vst1_bf16_x3(ptr, val);
				}
				// CHECK-LABEL: test_vst1_bf16_x3
				// CHECK64: tail call void @llvm.aarch64.neon.st1x3.v4bf16.p0bf16(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, bfloat* %ptr)
				// CHECK32: tail call void @llvm.arm.neon.vst1x3.p0bf16.v4bf16(bfloat* %ptr, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2)

				void test_vst1q_bf16_x3(bfloat16_t *ptr, bfloat16x8x3_t val) {
				vst1q_bf16_x3(ptr, val);
				}
				// CHECK-LABEL: test_vst1q_bf16_x3
				// CHECK64: tail call void @llvm.aarch64.neon.st1x3.v8bf16.p0bf16(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, bfloat* %ptr)
				// CHECK32: tail call void @llvm.arm.neon.vst1x3.p0bf16.v8bf16(bfloat* %ptr, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2)

				void test_vst1_bf16_x4(bfloat16_t *ptr, bfloat16x4x4_t val) {
				vst1_bf16_x4(ptr, val);
				}
				// CHECK-LABEL: test_vst1_bf16_x4
				// CHECK64: tail call void @llvm.aarch64.neon.st1x4.v4bf16.p0bf16(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, <4 x bfloat> %val.coerce.fca.3.extract, bfloat* %ptr)
				// CHECK32: tail call void @llvm.arm.neon.vst1x4.p0bf16.v4bf16(bfloat* %ptr, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2, <4 x bfloat> %3)

				void test_vst1q_bf16_x4(bfloat16_t *ptr, bfloat16x8x4_t val) {
				vst1q_bf16_x4(ptr, val);
				}
				// CHECK-LABEL: test_vst1q_bf16_x4
				// CHECK64: tail call void @llvm.aarch64.neon.st1x4.v8bf16.p0bf16(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, <8 x bfloat> %val.coerce.fca.3.extract, bfloat* %ptr)
				// CHECK32: tail call void @llvm.arm.neon.vst1x4.p0bf16.v8bf16(bfloat* %ptr, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, <8 x bfloat> %3)

				void test_vst2_bf16(bfloat16_t *ptr, bfloat16x4x2_t val) {
				vst2_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst2_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st2.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst2.p0i8.v4bf16(i8* %2, <4 x bfloat> %0, <4 x bfloat> %1, i32 2)

				void test_vst2q_bf16(bfloat16_t *ptr, bfloat16x8x2_t val) {
				vst2q_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst2q_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st2.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst2.p0i8.v8bf16(i8* %2, <8 x bfloat> %0, <8 x bfloat> %1, i32 2)

				void test_vst2_lane_bf16(bfloat16_t *ptr, bfloat16x4x2_t val) {
				vst2_lane_bf16(ptr, val, 1);
				}
				// CHECK-LABEL: test_vst2_lane_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st2lane.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, i64 1, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst2lane.p0i8.v4bf16(i8* %2, <4 x bfloat> %0, <4 x bfloat> %1, i32 1, i32 2)

				void test_vst2q_lane_bf16(bfloat16_t *ptr, bfloat16x8x2_t val) {
				vst2q_lane_bf16(ptr, val, 7);
				}
				// CHECK-LABEL: test_vst2q_lane_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st2lane.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, i64 7, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst2lane.p0i8.v8bf16(i8* %2, <8 x bfloat> %0, <8 x bfloat> %1, i32 7, i32 2)

				void test_vst3_bf16(bfloat16_t *ptr, bfloat16x4x3_t val) {
				vst3_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst3_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st3.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, i8* %0)
				// CHECK32 tail call void @llvm.arm.neon.vst3.p0i8.v4bf16(i8* %3, <4 x bfloat> %0, <4 x bfloat> %2, <4 x bfloat> %3, i32 2)

				void test_vst3q_bf16(bfloat16_t *ptr, bfloat16x8x3_t val) {
				vst3q_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst3q_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st3.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, i8* %0)
				// CHECK32 tail call void @llvm.arm.neon.vst3.p0i8.v8bf16(i8* %3, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, i32 2)

				void test_vst3_lane_bf16(bfloat16_t *ptr, bfloat16x4x3_t val) {
				vst3_lane_bf16(ptr, val, 1);
				}
				// CHECK-LABEL: test_vst3_lane_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st3lane.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, i64 1, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst3lane.p0i8.v4bf16(i8* %3, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2, i32 1, i32 2)

				void test_vst3q_lane_bf16(bfloat16_t *ptr, bfloat16x8x3_t val) {
				vst3q_lane_bf16(ptr, val, 7);
				}
				// CHECK-LABEL: test_vst3q_lane_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st3lane.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, i64 7, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst3lane.p0i8.v8bf16(i8* %3, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, i32 7, i32 2)

				void test_vst4_bf16(bfloat16_t *ptr, bfloat16x4x4_t val) {
				vst4_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst4_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st4.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, <4 x bfloat> %val.coerce.fca.3.extract, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst4.p0i8.v4bf16(i8* %4, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2, <4 x bfloat> %3, i32 2)

				void test_vst4q_bf16(bfloat16_t *ptr, bfloat16x8x4_t val) {
				vst4q_bf16(ptr, val);
				}
				// CHECK-LABEL: test_vst4q_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st4.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, <8 x bfloat> %val.coerce.fca.3.extract, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst4.p0i8.v8bf16(i8* %4, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, <8 x bfloat> %3, i32 2)

				void test_vst4_lane_bf16(bfloat16_t *ptr, bfloat16x4x4_t val) {
				vst4_lane_bf16(ptr, val, 1);
				}
				// CHECK-LABEL: test_vst4_lane_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st4lane.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, <4 x bfloat> %val.coerce.fca.3.extract, i64 1, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst4lane.p0i8.v4bf16(i8* %4, <4 x bfloat> %0, <4 x bfloat> %1, <4 x bfloat> %2, <4 x bfloat> %3, i32 1, i32 2)

				void test_vst4q_lane_bf16(bfloat16_t *ptr, bfloat16x8x4_t val) {
				vst4q_lane_bf16(ptr, val, 7);
				}
				// CHECK-LABEL: test_vst4q_lane_bf16
				// CHECK64: tail call void @llvm.aarch64.neon.st4lane.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, <8 x bfloat> %val.coerce.fca.3.extract, i64 7, i8* %0)
				// CHECK32: tail call void @llvm.arm.neon.vst4lane.p0i8.v8bf16(i8* %4, <8 x bfloat> %0, <8 x bfloat> %1, <8 x bfloat> %2, <8 x bfloat> %3, i32 7, i32 2)

clang/test/Sema/aarch64-bf16-ldst-intrinsics.c

This file was added.

				// RUN: %clang_cc1 -triple aarch64-arm-none-eabi -target-feature +neon -target-feature +bf16 \
				// RUN: -O2 -fallow-half-arguments-and-returns -verify -fsyntax-only %s

				#include "arm_neon.h"

				int x;

				bfloat16x4_t test_vld1_lane_bf16(bfloat16_t const *ptr, bfloat16x4_t src) {
				(void)vld1_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld1_lane_bf16(ptr, src, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld1_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x8_t test_vld1q_lane_bf16(bfloat16_t const *ptr, bfloat16x8_t src) {
				(void)vld1q_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld1q_lane_bf16(ptr, src, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld1q_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x4x2_t test_vld2_lane_bf16(bfloat16_t const *ptr, bfloat16x4x2_t src) {
				(void)vld2_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld2_lane_bf16(ptr, src, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld2_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x8x2_t test_vld2q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x2_t src) {
				(void)vld2q_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld2q_lane_bf16(ptr, src, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld2q_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x4x3_t test_vld3_lane_bf16(bfloat16_t const *ptr, bfloat16x4x3_t src) {
				(void)vld3_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld3_lane_bf16(ptr, src, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld3_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x8x3_t test_vld3q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x3_t src) {
				(void)vld3q_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld3q_lane_bf16(ptr, src, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld3q_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x4x4_t test_vld4_lane_bf16(bfloat16_t const *ptr, bfloat16x4x4_t src) {
				(void)vld4_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld4_lane_bf16(ptr, src, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld4_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				bfloat16x8x4_t test_vld4q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x4_t src) {
				(void)vld4q_lane_bf16(ptr, src, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				(void)vld4q_lane_bf16(ptr, src, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				return vld4q_lane_bf16(ptr, src, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst1_lane_bf16(bfloat16_t *ptr, bfloat16x4_t val) {
				vst1_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst1_lane_bf16(ptr, val, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst1_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst1q_lane_bf16(bfloat16_t *ptr, bfloat16x8_t val) {
				vst1q_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst1q_lane_bf16(ptr, val, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst1q_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst2_lane_bf16(bfloat16_t *ptr, bfloat16x4x2_t val) {
				vst2_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst2_lane_bf16(ptr, val, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst2_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst2q_lane_bf16(bfloat16_t *ptr, bfloat16x8x2_t val) {
				vst2q_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst2q_lane_bf16(ptr, val, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst2q_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst3_lane_bf16(bfloat16_t *ptr, bfloat16x4x3_t val) {
				vst3_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst3_lane_bf16(ptr, val, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst3_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst3q_lane_bf16(bfloat16_t *ptr, bfloat16x8x3_t val) {
				vst3q_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst3q_lane_bf16(ptr, val, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst3q_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst4_lane_bf16(bfloat16_t *ptr, bfloat16x4x4_t val) {
				vst4_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst4_lane_bf16(ptr, val, 4); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst4_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

				void test_vst4q_lane_bf16(bfloat16_t *ptr, bfloat16x8x4_t val) {
				vst4q_lane_bf16(ptr, val, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst4q_lane_bf16(ptr, val, 8); // expected-error-re {{argument value {{.*}} is outside the valid range}}
				vst4q_lane_bf16(ptr, val, x); // expected-error-re {{argument {{.*}} must be a constant integer}}
				}

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 3,374 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN: {
}		}
case Intrinsic::aarch64_neon_ld1x2:		case Intrinsic::aarch64_neon_ld1x2:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 2, AArch64::LD1Twov8b, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD1Twov8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 2, AArch64::LD1Twov16b, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD1Twov16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 2, AArch64::LD1Twov4h, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD1Twov4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 2, AArch64::LD1Twov8h, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD1Twov8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 2, AArch64::LD1Twov2s, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD1Twov2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 2, AArch64::LD1Twov4s, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD1Twov4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 2, AArch64::LD1Twov1d, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD1Twov1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 2, AArch64::LD1Twov2d, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD1Twov2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld1x3:		case Intrinsic::aarch64_neon_ld1x3:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 3, AArch64::LD1Threev8b, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD1Threev8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 3, AArch64::LD1Threev16b, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD1Threev16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 3, AArch64::LD1Threev4h, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD1Threev4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 3, AArch64::LD1Threev8h, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD1Threev8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 3, AArch64::LD1Threev2s, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD1Threev2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 3, AArch64::LD1Threev4s, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD1Threev4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 3, AArch64::LD1Threev1d, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD1Threev1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 3, AArch64::LD1Threev2d, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD1Threev2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld1x4:		case Intrinsic::aarch64_neon_ld1x4:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 4, AArch64::LD1Fourv8b, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 4, AArch64::LD1Fourv16b, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 4, AArch64::LD1Fourv4h, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 4, AArch64::LD1Fourv8h, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 4, AArch64::LD1Fourv2s, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 4, AArch64::LD1Fourv4s, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 4, AArch64::LD1Fourv1d, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 4, AArch64::LD1Fourv2d, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld2:		case Intrinsic::aarch64_neon_ld2:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 2, AArch64::LD2Twov8b, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Twov8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 2, AArch64::LD2Twov16b, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Twov16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 2, AArch64::LD2Twov4h, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Twov4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 2, AArch64::LD2Twov8h, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Twov8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 2, AArch64::LD2Twov2s, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Twov2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 2, AArch64::LD2Twov4s, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Twov4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 2, AArch64::LD1Twov1d, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD1Twov1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 2, AArch64::LD2Twov2d, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Twov2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld3:		case Intrinsic::aarch64_neon_ld3:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 3, AArch64::LD3Threev8b, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Threev8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 3, AArch64::LD3Threev16b, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Threev16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 3, AArch64::LD3Threev4h, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Threev4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 3, AArch64::LD3Threev8h, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Threev8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 3, AArch64::LD3Threev2s, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Threev2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 3, AArch64::LD3Threev4s, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Threev4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 3, AArch64::LD1Threev1d, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD1Threev1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 3, AArch64::LD3Threev2d, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Threev2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld4:		case Intrinsic::aarch64_neon_ld4:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 4, AArch64::LD4Fourv8b, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 4, AArch64::LD4Fourv16b, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 4, AArch64::LD4Fourv4h, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 4, AArch64::LD4Fourv8h, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 4, AArch64::LD4Fourv2s, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 4, AArch64::LD4Fourv4s, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 4, AArch64::LD1Fourv1d, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD1Fourv1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 4, AArch64::LD4Fourv2d, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Fourv2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld2r:		case Intrinsic::aarch64_neon_ld2r:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 2, AArch64::LD2Rv8b, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Rv8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 2, AArch64::LD2Rv16b, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Rv16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 2, AArch64::LD2Rv4h, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Rv4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 2, AArch64::LD2Rv8h, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Rv8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 2, AArch64::LD2Rv2s, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Rv2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 2, AArch64::LD2Rv4s, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Rv4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 2, AArch64::LD2Rv1d, AArch64::dsub0);		SelectLoad(Node, 2, AArch64::LD2Rv1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 2, AArch64::LD2Rv2d, AArch64::qsub0);		SelectLoad(Node, 2, AArch64::LD2Rv2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld3r:		case Intrinsic::aarch64_neon_ld3r:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 3, AArch64::LD3Rv8b, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Rv8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 3, AArch64::LD3Rv16b, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Rv16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 3, AArch64::LD3Rv4h, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Rv4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 3, AArch64::LD3Rv8h, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Rv8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 3, AArch64::LD3Rv2s, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Rv2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 3, AArch64::LD3Rv4s, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Rv4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 3, AArch64::LD3Rv1d, AArch64::dsub0);		SelectLoad(Node, 3, AArch64::LD3Rv1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 3, AArch64::LD3Rv2d, AArch64::qsub0);		SelectLoad(Node, 3, AArch64::LD3Rv2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld4r:		case Intrinsic::aarch64_neon_ld4r:
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectLoad(Node, 4, AArch64::LD4Rv8b, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Rv8b, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectLoad(Node, 4, AArch64::LD4Rv16b, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Rv16b, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectLoad(Node, 4, AArch64::LD4Rv4h, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Rv4h, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectLoad(Node, 4, AArch64::LD4Rv8h, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Rv8h, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectLoad(Node, 4, AArch64::LD4Rv2s, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Rv2s, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectLoad(Node, 4, AArch64::LD4Rv4s, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Rv4s, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectLoad(Node, 4, AArch64::LD4Rv1d, AArch64::dsub0);		SelectLoad(Node, 4, AArch64::LD4Rv1d, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectLoad(Node, 4, AArch64::LD4Rv2d, AArch64::qsub0);		SelectLoad(Node, 4, AArch64::LD4Rv2d, AArch64::qsub0);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld2lane:		case Intrinsic::aarch64_neon_ld2lane:
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectLoadLane(Node, 2, AArch64::LD2i8);		SelectLoadLane(Node, 2, AArch64::LD2i8);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectLoadLane(Node, 2, AArch64::LD2i16);		SelectLoadLane(Node, 2, AArch64::LD2i16);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectLoadLane(Node, 2, AArch64::LD2i32);		SelectLoadLane(Node, 2, AArch64::LD2i32);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectLoadLane(Node, 2, AArch64::LD2i64);		SelectLoadLane(Node, 2, AArch64::LD2i64);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld3lane:		case Intrinsic::aarch64_neon_ld3lane:
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectLoadLane(Node, 3, AArch64::LD3i8);		SelectLoadLane(Node, 3, AArch64::LD3i8);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectLoadLane(Node, 3, AArch64::LD3i16);		SelectLoadLane(Node, 3, AArch64::LD3i16);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectLoadLane(Node, 3, AArch64::LD3i32);		SelectLoadLane(Node, 3, AArch64::LD3i32);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectLoadLane(Node, 3, AArch64::LD3i64);		SelectLoadLane(Node, 3, AArch64::LD3i64);
return;		return;
}		}
break;		break;
case Intrinsic::aarch64_neon_ld4lane:		case Intrinsic::aarch64_neon_ld4lane:
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectLoadLane(Node, 4, AArch64::LD4i8);		SelectLoadLane(Node, 4, AArch64::LD4i8);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectLoadLane(Node, 4, AArch64::LD4i16);		SelectLoadLane(Node, 4, AArch64::LD4i16);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectLoadLane(Node, 4, AArch64::LD4i32);		SelectLoadLane(Node, 4, AArch64::LD4i32);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	default:
break;		break;
case Intrinsic::aarch64_neon_st1x2: {		case Intrinsic::aarch64_neon_st1x2: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectStore(Node, 2, AArch64::ST1Twov8b);		SelectStore(Node, 2, AArch64::ST1Twov8b);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectStore(Node, 2, AArch64::ST1Twov16b);		SelectStore(Node, 2, AArch64::ST1Twov16b);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
		VT == MVT::v4bf16) {
SelectStore(Node, 2, AArch64::ST1Twov4h);		SelectStore(Node, 2, AArch64::ST1Twov4h);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\|
		VT == MVT::v8bf16) {
SelectStore(Node, 2, AArch64::ST1Twov8h);		SelectStore(Node, 2, AArch64::ST1Twov8h);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectStore(Node, 2, AArch64::ST1Twov2s);		SelectStore(Node, 2, AArch64::ST1Twov2s);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectStore(Node, 2, AArch64::ST1Twov4s);		SelectStore(Node, 2, AArch64::ST1Twov4s);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectStore(Node, 2, AArch64::ST1Twov2d);		SelectStore(Node, 2, AArch64::ST1Twov2d);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectStore(Node, 2, AArch64::ST1Twov1d);		SelectStore(Node, 2, AArch64::ST1Twov1d);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st1x3: {		case Intrinsic::aarch64_neon_st1x3: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectStore(Node, 3, AArch64::ST1Threev8b);		SelectStore(Node, 3, AArch64::ST1Threev8b);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectStore(Node, 3, AArch64::ST1Threev16b);		SelectStore(Node, 3, AArch64::ST1Threev16b);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
		VT == MVT::v4bf16) {
SelectStore(Node, 3, AArch64::ST1Threev4h);		SelectStore(Node, 3, AArch64::ST1Threev4h);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\|
		VT == MVT::v8bf16) {
SelectStore(Node, 3, AArch64::ST1Threev8h);		SelectStore(Node, 3, AArch64::ST1Threev8h);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectStore(Node, 3, AArch64::ST1Threev2s);		SelectStore(Node, 3, AArch64::ST1Threev2s);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectStore(Node, 3, AArch64::ST1Threev4s);		SelectStore(Node, 3, AArch64::ST1Threev4s);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectStore(Node, 3, AArch64::ST1Threev2d);		SelectStore(Node, 3, AArch64::ST1Threev2d);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectStore(Node, 3, AArch64::ST1Threev1d);		SelectStore(Node, 3, AArch64::ST1Threev1d);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st1x4: {		case Intrinsic::aarch64_neon_st1x4: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectStore(Node, 4, AArch64::ST1Fourv8b);		SelectStore(Node, 4, AArch64::ST1Fourv8b);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectStore(Node, 4, AArch64::ST1Fourv16b);		SelectStore(Node, 4, AArch64::ST1Fourv16b);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
		VT == MVT::v4bf16) {
SelectStore(Node, 4, AArch64::ST1Fourv4h);		SelectStore(Node, 4, AArch64::ST1Fourv4h);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\|
		VT == MVT::v8bf16) {
SelectStore(Node, 4, AArch64::ST1Fourv8h);		SelectStore(Node, 4, AArch64::ST1Fourv8h);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectStore(Node, 4, AArch64::ST1Fourv2s);		SelectStore(Node, 4, AArch64::ST1Fourv2s);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectStore(Node, 4, AArch64::ST1Fourv4s);		SelectStore(Node, 4, AArch64::ST1Fourv4s);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectStore(Node, 4, AArch64::ST1Fourv2d);		SelectStore(Node, 4, AArch64::ST1Fourv2d);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectStore(Node, 4, AArch64::ST1Fourv1d);		SelectStore(Node, 4, AArch64::ST1Fourv1d);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st2: {		case Intrinsic::aarch64_neon_st2: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectStore(Node, 2, AArch64::ST2Twov8b);		SelectStore(Node, 2, AArch64::ST2Twov8b);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectStore(Node, 2, AArch64::ST2Twov16b);		SelectStore(Node, 2, AArch64::ST2Twov16b);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
		VT == MVT::v4bf16) {
SelectStore(Node, 2, AArch64::ST2Twov4h);		SelectStore(Node, 2, AArch64::ST2Twov4h);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\|
		VT == MVT::v8bf16) {
SelectStore(Node, 2, AArch64::ST2Twov8h);		SelectStore(Node, 2, AArch64::ST2Twov8h);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectStore(Node, 2, AArch64::ST2Twov2s);		SelectStore(Node, 2, AArch64::ST2Twov2s);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectStore(Node, 2, AArch64::ST2Twov4s);		SelectStore(Node, 2, AArch64::ST2Twov4s);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectStore(Node, 2, AArch64::ST2Twov2d);		SelectStore(Node, 2, AArch64::ST2Twov2d);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectStore(Node, 2, AArch64::ST1Twov1d);		SelectStore(Node, 2, AArch64::ST1Twov1d);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st3: {		case Intrinsic::aarch64_neon_st3: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectStore(Node, 3, AArch64::ST3Threev8b);		SelectStore(Node, 3, AArch64::ST3Threev8b);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectStore(Node, 3, AArch64::ST3Threev16b);		SelectStore(Node, 3, AArch64::ST3Threev16b);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
		VT == MVT::v4bf16) {
SelectStore(Node, 3, AArch64::ST3Threev4h);		SelectStore(Node, 3, AArch64::ST3Threev4h);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\|
		VT == MVT::v8bf16) {
SelectStore(Node, 3, AArch64::ST3Threev8h);		SelectStore(Node, 3, AArch64::ST3Threev8h);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectStore(Node, 3, AArch64::ST3Threev2s);		SelectStore(Node, 3, AArch64::ST3Threev2s);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectStore(Node, 3, AArch64::ST3Threev4s);		SelectStore(Node, 3, AArch64::ST3Threev4s);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectStore(Node, 3, AArch64::ST3Threev2d);		SelectStore(Node, 3, AArch64::ST3Threev2d);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectStore(Node, 3, AArch64::ST1Threev1d);		SelectStore(Node, 3, AArch64::ST1Threev1d);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st4: {		case Intrinsic::aarch64_neon_st4: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectStore(Node, 4, AArch64::ST4Fourv8b);		SelectStore(Node, 4, AArch64::ST4Fourv8b);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectStore(Node, 4, AArch64::ST4Fourv16b);		SelectStore(Node, 4, AArch64::ST4Fourv16b);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
		VT == MVT::v4bf16) {
SelectStore(Node, 4, AArch64::ST4Fourv4h);		SelectStore(Node, 4, AArch64::ST4Fourv4h);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\|
		VT == MVT::v8bf16) {
SelectStore(Node, 4, AArch64::ST4Fourv8h);		SelectStore(Node, 4, AArch64::ST4Fourv8h);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectStore(Node, 4, AArch64::ST4Fourv2s);		SelectStore(Node, 4, AArch64::ST4Fourv2s);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectStore(Node, 4, AArch64::ST4Fourv4s);		SelectStore(Node, 4, AArch64::ST4Fourv4s);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectStore(Node, 4, AArch64::ST4Fourv2d);		SelectStore(Node, 4, AArch64::ST4Fourv2d);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectStore(Node, 4, AArch64::ST1Fourv1d);		SelectStore(Node, 4, AArch64::ST1Fourv1d);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st2lane: {		case Intrinsic::aarch64_neon_st2lane: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectStoreLane(Node, 2, AArch64::ST2i8);		SelectStoreLane(Node, 2, AArch64::ST2i8);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectStoreLane(Node, 2, AArch64::ST2i16);		SelectStoreLane(Node, 2, AArch64::ST2i16);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectStoreLane(Node, 2, AArch64::ST2i32);		SelectStoreLane(Node, 2, AArch64::ST2i32);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectStoreLane(Node, 2, AArch64::ST2i64);		SelectStoreLane(Node, 2, AArch64::ST2i64);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st3lane: {		case Intrinsic::aarch64_neon_st3lane: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectStoreLane(Node, 3, AArch64::ST3i8);		SelectStoreLane(Node, 3, AArch64::ST3i8);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectStoreLane(Node, 3, AArch64::ST3i16);		SelectStoreLane(Node, 3, AArch64::ST3i16);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectStoreLane(Node, 3, AArch64::ST3i32);		SelectStoreLane(Node, 3, AArch64::ST3i32);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectStoreLane(Node, 3, AArch64::ST3i64);		SelectStoreLane(Node, 3, AArch64::ST3i64);
return;		return;
}		}
break;		break;
}		}
case Intrinsic::aarch64_neon_st4lane: {		case Intrinsic::aarch64_neon_st4lane: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectStoreLane(Node, 4, AArch64::ST4i8);		SelectStoreLane(Node, 4, AArch64::ST4i8);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectStoreLane(Node, 4, AArch64::ST4i16);		SelectStoreLane(Node, 4, AArch64::ST4i16);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectStoreLane(Node, 4, AArch64::ST4i32);		SelectStoreLane(Node, 4, AArch64::ST4i32);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	void AArch64DAGToDAGISel::Select(SDNode *Node) {
}		}
case AArch64ISD::LD2post: {		case AArch64ISD::LD2post: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 2, AArch64::LD2Twov8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 2, AArch64::LD2Twov16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 2, AArch64::LD2Twov4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 2, AArch64::LD2Twov8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 2, AArch64::LD2Twov2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 2, AArch64::LD2Twov4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 2, AArch64::LD1Twov1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 2, AArch64::LD2Twov2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Twov2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD3post: {		case AArch64ISD::LD3post: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 3, AArch64::LD3Threev8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 3, AArch64::LD3Threev16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 3, AArch64::LD3Threev4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 3, AArch64::LD3Threev8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 3, AArch64::LD3Threev2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 3, AArch64::LD3Threev4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 3, AArch64::LD1Threev1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 3, AArch64::LD3Threev2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Threev2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD4post: {		case AArch64ISD::LD4post: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 4, AArch64::LD4Fourv2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Fourv2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD1x2post: {		case AArch64ISD::LD1x2post: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 2, AArch64::LD1Twov8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 2, AArch64::LD1Twov16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 2, AArch64::LD1Twov4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 2, AArch64::LD1Twov8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 2, AArch64::LD1Twov2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 2, AArch64::LD1Twov4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 2, AArch64::LD1Twov1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 2, AArch64::LD1Twov2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD1Twov2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD1x3post: {		case AArch64ISD::LD1x3post: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 3, AArch64::LD1Threev8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 3, AArch64::LD1Threev16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 3, AArch64::LD1Threev4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 3, AArch64::LD1Threev8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 3, AArch64::LD1Threev2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 3, AArch64::LD1Threev4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 3, AArch64::LD1Threev1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 3, AArch64::LD1Threev2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD1Threev2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD1x4post: {		case AArch64ISD::LD1x4post: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 4, AArch64::LD1Fourv2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD1Fourv2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD1DUPpost: {		case AArch64ISD::LD1DUPpost: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 1, AArch64::LD1Rv8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 1, AArch64::LD1Rv16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 1, AArch64::LD1Rv4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 1, AArch64::LD1Rv8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 1, AArch64::LD1Rv2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 1, AArch64::LD1Rv4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 1, AArch64::LD1Rv1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 1, AArch64::LD1Rv2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 1, AArch64::LD1Rv2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD2DUPpost: {		case AArch64ISD::LD2DUPpost: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 2, AArch64::LD2Rv8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 2, AArch64::LD2Rv16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 2, AArch64::LD2Rv4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 2, AArch64::LD2Rv8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 2, AArch64::LD2Rv2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 2, AArch64::LD2Rv4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 2, AArch64::LD2Rv1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 2, AArch64::LD2Rv2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 2, AArch64::LD2Rv2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD3DUPpost: {		case AArch64ISD::LD3DUPpost: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 3, AArch64::LD3Rv8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 3, AArch64::LD3Rv16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 3, AArch64::LD3Rv4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 3, AArch64::LD3Rv8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 3, AArch64::LD3Rv2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 3, AArch64::LD3Rv4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 3, AArch64::LD3Rv1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 3, AArch64::LD3Rv2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 3, AArch64::LD3Rv2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD4DUPpost: {		case AArch64ISD::LD4DUPpost: {
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostLoad(Node, 4, AArch64::LD4Rv8b_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv8b_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostLoad(Node, 4, AArch64::LD4Rv16b_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv16b_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostLoad(Node, 4, AArch64::LD4Rv4h_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv4h_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostLoad(Node, 4, AArch64::LD4Rv8h_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv8h_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostLoad(Node, 4, AArch64::LD4Rv2s_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv2s_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostLoad(Node, 4, AArch64::LD4Rv4s_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv4s_POST, AArch64::qsub0);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostLoad(Node, 4, AArch64::LD4Rv1d_POST, AArch64::dsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv1d_POST, AArch64::dsub0);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostLoad(Node, 4, AArch64::LD4Rv2d_POST, AArch64::qsub0);		SelectPostLoad(Node, 4, AArch64::LD4Rv2d_POST, AArch64::qsub0);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD1LANEpost: {		case AArch64ISD::LD1LANEpost: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostLoadLane(Node, 1, AArch64::LD1i8_POST);		SelectPostLoadLane(Node, 1, AArch64::LD1i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostLoadLane(Node, 1, AArch64::LD1i16_POST);		SelectPostLoadLane(Node, 1, AArch64::LD1i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostLoadLane(Node, 1, AArch64::LD1i32_POST);		SelectPostLoadLane(Node, 1, AArch64::LD1i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostLoadLane(Node, 1, AArch64::LD1i64_POST);		SelectPostLoadLane(Node, 1, AArch64::LD1i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD2LANEpost: {		case AArch64ISD::LD2LANEpost: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostLoadLane(Node, 2, AArch64::LD2i8_POST);		SelectPostLoadLane(Node, 2, AArch64::LD2i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostLoadLane(Node, 2, AArch64::LD2i16_POST);		SelectPostLoadLane(Node, 2, AArch64::LD2i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostLoadLane(Node, 2, AArch64::LD2i32_POST);		SelectPostLoadLane(Node, 2, AArch64::LD2i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostLoadLane(Node, 2, AArch64::LD2i64_POST);		SelectPostLoadLane(Node, 2, AArch64::LD2i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD3LANEpost: {		case AArch64ISD::LD3LANEpost: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostLoadLane(Node, 3, AArch64::LD3i8_POST);		SelectPostLoadLane(Node, 3, AArch64::LD3i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostLoadLane(Node, 3, AArch64::LD3i16_POST);		SelectPostLoadLane(Node, 3, AArch64::LD3i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostLoadLane(Node, 3, AArch64::LD3i32_POST);		SelectPostLoadLane(Node, 3, AArch64::LD3i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostLoadLane(Node, 3, AArch64::LD3i64_POST);		SelectPostLoadLane(Node, 3, AArch64::LD3i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::LD4LANEpost: {		case AArch64ISD::LD4LANEpost: {
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostLoadLane(Node, 4, AArch64::LD4i8_POST);		SelectPostLoadLane(Node, 4, AArch64::LD4i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostLoadLane(Node, 4, AArch64::LD4i16_POST);		SelectPostLoadLane(Node, 4, AArch64::LD4i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostLoadLane(Node, 4, AArch64::LD4i32_POST);		SelectPostLoadLane(Node, 4, AArch64::LD4i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostLoadLane(Node, 4, AArch64::LD4i64_POST);		SelectPostLoadLane(Node, 4, AArch64::LD4i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::ST2post: {		case AArch64ISD::ST2post: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostStore(Node, 2, AArch64::ST2Twov8b_POST);		SelectPostStore(Node, 2, AArch64::ST2Twov8b_POST);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostStore(Node, 2, AArch64::ST2Twov16b_POST);		SelectPostStore(Node, 2, AArch64::ST2Twov16b_POST);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostStore(Node, 2, AArch64::ST2Twov4h_POST);		SelectPostStore(Node, 2, AArch64::ST2Twov4h_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostStore(Node, 2, AArch64::ST2Twov8h_POST);		SelectPostStore(Node, 2, AArch64::ST2Twov8h_POST);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostStore(Node, 2, AArch64::ST2Twov2s_POST);		SelectPostStore(Node, 2, AArch64::ST2Twov2s_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostStore(Node, 2, AArch64::ST2Twov4s_POST);		SelectPostStore(Node, 2, AArch64::ST2Twov4s_POST);
return;		return;
Show All 9 Lines
case AArch64ISD::ST3post: {		case AArch64ISD::ST3post: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostStore(Node, 3, AArch64::ST3Threev8b_POST);		SelectPostStore(Node, 3, AArch64::ST3Threev8b_POST);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostStore(Node, 3, AArch64::ST3Threev16b_POST);		SelectPostStore(Node, 3, AArch64::ST3Threev16b_POST);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostStore(Node, 3, AArch64::ST3Threev4h_POST);		SelectPostStore(Node, 3, AArch64::ST3Threev4h_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostStore(Node, 3, AArch64::ST3Threev8h_POST);		SelectPostStore(Node, 3, AArch64::ST3Threev8h_POST);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostStore(Node, 3, AArch64::ST3Threev2s_POST);		SelectPostStore(Node, 3, AArch64::ST3Threev2s_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostStore(Node, 3, AArch64::ST3Threev4s_POST);		SelectPostStore(Node, 3, AArch64::ST3Threev4s_POST);
return;		return;
Show All 9 Lines
case AArch64ISD::ST4post: {		case AArch64ISD::ST4post: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostStore(Node, 4, AArch64::ST4Fourv8b_POST);		SelectPostStore(Node, 4, AArch64::ST4Fourv8b_POST);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostStore(Node, 4, AArch64::ST4Fourv16b_POST);		SelectPostStore(Node, 4, AArch64::ST4Fourv16b_POST);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostStore(Node, 4, AArch64::ST4Fourv4h_POST);		SelectPostStore(Node, 4, AArch64::ST4Fourv4h_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostStore(Node, 4, AArch64::ST4Fourv8h_POST);		SelectPostStore(Node, 4, AArch64::ST4Fourv8h_POST);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostStore(Node, 4, AArch64::ST4Fourv2s_POST);		SelectPostStore(Node, 4, AArch64::ST4Fourv2s_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostStore(Node, 4, AArch64::ST4Fourv4s_POST);		SelectPostStore(Node, 4, AArch64::ST4Fourv4s_POST);
return;		return;
Show All 9 Lines
case AArch64ISD::ST1x2post: {		case AArch64ISD::ST1x2post: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostStore(Node, 2, AArch64::ST1Twov8b_POST);		SelectPostStore(Node, 2, AArch64::ST1Twov8b_POST);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostStore(Node, 2, AArch64::ST1Twov16b_POST);		SelectPostStore(Node, 2, AArch64::ST1Twov16b_POST);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostStore(Node, 2, AArch64::ST1Twov4h_POST);		SelectPostStore(Node, 2, AArch64::ST1Twov4h_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostStore(Node, 2, AArch64::ST1Twov8h_POST);		SelectPostStore(Node, 2, AArch64::ST1Twov8h_POST);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostStore(Node, 2, AArch64::ST1Twov2s_POST);		SelectPostStore(Node, 2, AArch64::ST1Twov2s_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostStore(Node, 2, AArch64::ST1Twov4s_POST);		SelectPostStore(Node, 2, AArch64::ST1Twov4s_POST);
return;		return;
Show All 9 Lines
case AArch64ISD::ST1x3post: {		case AArch64ISD::ST1x3post: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostStore(Node, 3, AArch64::ST1Threev8b_POST);		SelectPostStore(Node, 3, AArch64::ST1Threev8b_POST);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostStore(Node, 3, AArch64::ST1Threev16b_POST);		SelectPostStore(Node, 3, AArch64::ST1Threev16b_POST);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostStore(Node, 3, AArch64::ST1Threev4h_POST);		SelectPostStore(Node, 3, AArch64::ST1Threev4h_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16 ) {
SelectPostStore(Node, 3, AArch64::ST1Threev8h_POST);		SelectPostStore(Node, 3, AArch64::ST1Threev8h_POST);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostStore(Node, 3, AArch64::ST1Threev2s_POST);		SelectPostStore(Node, 3, AArch64::ST1Threev2s_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostStore(Node, 3, AArch64::ST1Threev4s_POST);		SelectPostStore(Node, 3, AArch64::ST1Threev4s_POST);
return;		return;
Show All 9 Lines
case AArch64ISD::ST1x4post: {		case AArch64ISD::ST1x4post: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v8i8) {		if (VT == MVT::v8i8) {
SelectPostStore(Node, 4, AArch64::ST1Fourv8b_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv8b_POST);
return;		return;
} else if (VT == MVT::v16i8) {		} else if (VT == MVT::v16i8) {
SelectPostStore(Node, 4, AArch64::ST1Fourv16b_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv16b_POST);
return;		return;
} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16) {		} else if (VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\| VT == MVT::v4bf16) {
SelectPostStore(Node, 4, AArch64::ST1Fourv4h_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv4h_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16) {		} else if (VT == MVT::v8i16 \|\| VT == MVT::v8f16 \|\| VT == MVT::v8bf16) {
SelectPostStore(Node, 4, AArch64::ST1Fourv8h_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv8h_POST);
return;		return;
} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {		} else if (VT == MVT::v2i32 \|\| VT == MVT::v2f32) {
SelectPostStore(Node, 4, AArch64::ST1Fourv2s_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv2s_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {		} else if (VT == MVT::v4i32 \|\| VT == MVT::v4f32) {
SelectPostStore(Node, 4, AArch64::ST1Fourv4s_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv4s_POST);
return;		return;
} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {		} else if (VT == MVT::v1i64 \|\| VT == MVT::v1f64) {
SelectPostStore(Node, 4, AArch64::ST1Fourv1d_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv1d_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {		} else if (VT == MVT::v2i64 \|\| VT == MVT::v2f64) {
SelectPostStore(Node, 4, AArch64::ST1Fourv2d_POST);		SelectPostStore(Node, 4, AArch64::ST1Fourv2d_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::ST2LANEpost: {		case AArch64ISD::ST2LANEpost: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostStoreLane(Node, 2, AArch64::ST2i8_POST);		SelectPostStoreLane(Node, 2, AArch64::ST2i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostStoreLane(Node, 2, AArch64::ST2i16_POST);		SelectPostStoreLane(Node, 2, AArch64::ST2i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostStoreLane(Node, 2, AArch64::ST2i32_POST);		SelectPostStoreLane(Node, 2, AArch64::ST2i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostStoreLane(Node, 2, AArch64::ST2i64_POST);		SelectPostStoreLane(Node, 2, AArch64::ST2i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::ST3LANEpost: {		case AArch64ISD::ST3LANEpost: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostStoreLane(Node, 3, AArch64::ST3i8_POST);		SelectPostStoreLane(Node, 3, AArch64::ST3i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostStoreLane(Node, 3, AArch64::ST3i16_POST);		SelectPostStoreLane(Node, 3, AArch64::ST3i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostStoreLane(Node, 3, AArch64::ST3i32_POST);		SelectPostStoreLane(Node, 3, AArch64::ST3i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
SelectPostStoreLane(Node, 3, AArch64::ST3i64_POST);		SelectPostStoreLane(Node, 3, AArch64::ST3i64_POST);
return;		return;
}		}
break;		break;
}		}
case AArch64ISD::ST4LANEpost: {		case AArch64ISD::ST4LANEpost: {
VT = Node->getOperand(1).getValueType();		VT = Node->getOperand(1).getValueType();
if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {		if (VT == MVT::v16i8 \|\| VT == MVT::v8i8) {
SelectPostStoreLane(Node, 4, AArch64::ST4i8_POST);		SelectPostStoreLane(Node, 4, AArch64::ST4i8_POST);
return;		return;
} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|		} else if (VT == MVT::v8i16 \|\| VT == MVT::v4i16 \|\| VT == MVT::v4f16 \|\|
VT == MVT::v8f16) {		VT == MVT::v8f16 \|\| VT == MVT::v4bf16 \|\| VT == MVT::v8bf16) {
SelectPostStoreLane(Node, 4, AArch64::ST4i16_POST);		SelectPostStoreLane(Node, 4, AArch64::ST4i16_POST);
return;		return;
} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|		} else if (VT == MVT::v4i32 \|\| VT == MVT::v2i32 \|\| VT == MVT::v4f32 \|\|
VT == MVT::v2f32) {		VT == MVT::v2f32) {
SelectPostStoreLane(Node, 4, AArch64::ST4i32_POST);		SelectPostStoreLane(Node, 4, AArch64::ST4i32_POST);
return;		return;
} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|		} else if (VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::v1f64) {		VT == MVT::v1f64) {
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,227 Lines • ▼ Show 20 Lines
let AddedComplexity = 10 in {		let AddedComplexity = 10 in {
let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
// We must do vector loads with LD1 in big-endian.		// We must do vector loads with LD1 in big-endian.
defm : VecROLoadPat<ro64, v2i32, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v2i32, LDRDroW, LDRDroX>;
defm : VecROLoadPat<ro64, v2f32, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v2f32, LDRDroW, LDRDroX>;
defm : VecROLoadPat<ro64, v8i8, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v8i8, LDRDroW, LDRDroX>;
defm : VecROLoadPat<ro64, v4i16, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v4i16, LDRDroW, LDRDroX>;
defm : VecROLoadPat<ro64, v4f16, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v4f16, LDRDroW, LDRDroX>;
		defm : VecROLoadPat<ro64, v4bf16, LDRDroW, LDRDroX>;
}		}

defm : VecROLoadPat<ro64, v1i64, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v1i64, LDRDroW, LDRDroX>;
defm : VecROLoadPat<ro64, v1f64, LDRDroW, LDRDroX>;		defm : VecROLoadPat<ro64, v1f64, LDRDroW, LDRDroX>;

// Match all load 128 bits width whose type is compatible with FPR128		// Match all load 128 bits width whose type is compatible with FPR128
let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
// We must do vector loads with LD1 in big-endian.		// We must do vector loads with LD1 in big-endian.
defm : VecROLoadPat<ro128, v2i64, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v2i64, LDRQroW, LDRQroX>;
defm : VecROLoadPat<ro128, v2f64, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v2f64, LDRQroW, LDRQroX>;
defm : VecROLoadPat<ro128, v4i32, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v4i32, LDRQroW, LDRQroX>;
defm : VecROLoadPat<ro128, v4f32, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v4f32, LDRQroW, LDRQroX>;
defm : VecROLoadPat<ro128, v8i16, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v8i16, LDRQroW, LDRQroX>;
defm : VecROLoadPat<ro128, v8f16, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v8f16, LDRQroW, LDRQroX>;
		defm : VecROLoadPat<ro128, v8bf16, LDRQroW, LDRQroX>;
defm : VecROLoadPat<ro128, v16i8, LDRQroW, LDRQroX>;		defm : VecROLoadPat<ro128, v16i8, LDRQroW, LDRQroX>;
}		}
} // AddedComplexity = 10		} // AddedComplexity = 10

// zextload -> i64		// zextload -> i64
multiclass ExtLoadTo64ROPat<ROAddrMode ro, SDPatternOperator loadop,		multiclass ExtLoadTo64ROPat<ROAddrMode ro, SDPatternOperator loadop,
Instruction INSTW, Instruction INSTX> {		Instruction INSTW, Instruction INSTX> {
def : Pat<(i64 (loadop (ro.Wpat GPR64sp:$Rn, GPR32:$Rm, ro.Wext:$extend))),		def : Pat<(i64 (loadop (ro.Wpat GPR64sp:$Rn, GPR32:$Rm, ro.Wext:$extend))),
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	let Predicates = [IsLE] in {
def : Pat<(v8i8 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),		def : Pat<(v8i8 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
def : Pat<(v4i16 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),		def : Pat<(v4i16 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
def : Pat<(v2i32 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),		def : Pat<(v2i32 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
def : Pat<(v4f16 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),		def : Pat<(v4f16 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
		def : Pat<(v4bf16 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
}		}
def : Pat<(v1f64 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),		def : Pat<(v1f64 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
def : Pat<(v1i64 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),		def : Pat<(v1i64 (load (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;		(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;

// Match all load 128 bits width whose type is compatible with FPR128		// Match all load 128 bits width whose type is compatible with FPR128
let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
// We must use LD1 to perform vector loads in big-endian.		// We must use LD1 to perform vector loads in big-endian.
def : Pat<(v4f32 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v4f32 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(v2f64 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v2f64 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(v16i8 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v16i8 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(v8i16 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v8i16 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(v4i32 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v4i32 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(v2i64 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v2i64 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(v8f16 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(v8f16 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
		def : Pat<(v8bf16 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;
}		}
def : Pat<(f128 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),		def : Pat<(f128 (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset))),
(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;		(LDRQui GPR64sp:$Rn, uimm12s16:$offset)>;

defm LDRHH : LoadUI<0b01, 0, 0b01, GPR32, uimm12s2, "ldrh",		defm LDRHH : LoadUI<0b01, 0, 0b01, GPR32, uimm12s2, "ldrh",
[(set GPR32:$Rt,		[(set GPR32:$Rt,
(zextloadi16 (am_indexed16 GPR64sp:$Rn,		(zextloadi16 (am_indexed16 GPR64sp:$Rn,
uimm12s2:$offset)))]>;		uimm12s2:$offset)))]>;
▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines
// Match all store 64 bits width whose type is compatible with FPR64		// Match all store 64 bits width whose type is compatible with FPR64
let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
// We must use ST1 to store vectors in big-endian.		// We must use ST1 to store vectors in big-endian.
defm : VecROStorePat<ro64, v2i32, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v2i32, FPR64, STRDroW, STRDroX>;
defm : VecROStorePat<ro64, v2f32, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v2f32, FPR64, STRDroW, STRDroX>;
defm : VecROStorePat<ro64, v4i16, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v4i16, FPR64, STRDroW, STRDroX>;
defm : VecROStorePat<ro64, v8i8, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v8i8, FPR64, STRDroW, STRDroX>;
defm : VecROStorePat<ro64, v4f16, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v4f16, FPR64, STRDroW, STRDroX>;
		defm : VecROStorePat<ro64, v4bf16, FPR64, STRDroW, STRDroX>;
}		}

defm : VecROStorePat<ro64, v1i64, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v1i64, FPR64, STRDroW, STRDroX>;
defm : VecROStorePat<ro64, v1f64, FPR64, STRDroW, STRDroX>;		defm : VecROStorePat<ro64, v1f64, FPR64, STRDroW, STRDroX>;

// Match all store 128 bits width whose type is compatible with FPR128		// Match all store 128 bits width whose type is compatible with FPR128
let Predicates = [IsLE, UseSTRQro] in {		let Predicates = [IsLE, UseSTRQro] in {
// We must use ST1 to store vectors in big-endian.		// We must use ST1 to store vectors in big-endian.
defm : VecROStorePat<ro128, v2i64, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v2i64, FPR128, STRQroW, STRQroX>;
defm : VecROStorePat<ro128, v2f64, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v2f64, FPR128, STRQroW, STRQroX>;
defm : VecROStorePat<ro128, v4i32, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v4i32, FPR128, STRQroW, STRQroX>;
defm : VecROStorePat<ro128, v4f32, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v4f32, FPR128, STRQroW, STRQroX>;
defm : VecROStorePat<ro128, v8i16, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v8i16, FPR128, STRQroW, STRQroX>;
defm : VecROStorePat<ro128, v16i8, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v16i8, FPR128, STRQroW, STRQroX>;
defm : VecROStorePat<ro128, v8f16, FPR128, STRQroW, STRQroX>;		defm : VecROStorePat<ro128, v8f16, FPR128, STRQroW, STRQroX>;
		defm : VecROStorePat<ro128, v8bf16, FPR128, STRQroW, STRQroX>;
}		}
} // AddedComplexity = 10		} // AddedComplexity = 10

// Match stores from lane 0 to the appropriate subreg's store.		// Match stores from lane 0 to the appropriate subreg's store.
multiclass VecROStoreLane0Pat<ROAddrMode ro, SDPatternOperator storeop,		multiclass VecROStoreLane0Pat<ROAddrMode ro, SDPatternOperator storeop,
ValueType VecTy, ValueType STy,		ValueType VecTy, ValueType STy,
SubRegIndex SubRegIdx,		SubRegIndex SubRegIdx,
Instruction STRW, Instruction STRX> {		Instruction STRW, Instruction STRX> {
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	def : Pat<(store (v4i16 FPR64:$Rt),
(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),		(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;		(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;
def : Pat<(store (v2i32 FPR64:$Rt),		def : Pat<(store (v2i32 FPR64:$Rt),
(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),		(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;		(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;
def : Pat<(store (v4f16 FPR64:$Rt),		def : Pat<(store (v4f16 FPR64:$Rt),
(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),		(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;		(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;
		def : Pat<(store (v4bf16 FPR64:$Rt),
		(am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
		(STRDui FPR64:$Rt, GPR64sp:$Rn, uimm12s8:$offset)>;
}		}

// Match all store 128 bits width whose type is compatible with FPR128		// Match all store 128 bits width whose type is compatible with FPR128
def : Pat<(store (f128 FPR128:$Rt),		def : Pat<(store (f128 FPR128:$Rt),
(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),		(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),
(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;		(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;

let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
Show All 14 Lines	def : Pat<(store (v4i32 FPR128:$Rt),
(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),		(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),
(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;		(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(store (v2i64 FPR128:$Rt),		def : Pat<(store (v2i64 FPR128:$Rt),
(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),		(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),
(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;		(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;
def : Pat<(store (v8f16 FPR128:$Rt),		def : Pat<(store (v8f16 FPR128:$Rt),
(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),		(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),
(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;		(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;
		def : Pat<(store (v8bf16 FPR128:$Rt),
		(am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)),
		(STRQui FPR128:$Rt, GPR64sp:$Rn, uimm12s16:$offset)>;
}		}

// truncstore i64		// truncstore i64
def : Pat<(truncstorei32 GPR64:$Rt,		def : Pat<(truncstorei32 GPR64:$Rt,
(am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)),		(am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)),
(STRWui (EXTRACT_SUBREG GPR64:$Rt, sub_32), GPR64sp:$Rn, uimm12s4:$offset)>;		(STRWui (EXTRACT_SUBREG GPR64:$Rt, sub_32), GPR64sp:$Rn, uimm12s4:$offset)>;
def : Pat<(truncstorei16 GPR64:$Rt,		def : Pat<(truncstorei16 GPR64:$Rt,
(am_indexed16 GPR64sp:$Rn, uimm12s2:$offset)),		(am_indexed16 GPR64sp:$Rn, uimm12s2:$offset)),
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	def : Pat<(store (v4i16 FPR64:$Rt),
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),		(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;
def : Pat<(store (v2i32 FPR64:$Rt),		def : Pat<(store (v2i32 FPR64:$Rt),
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),		(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;
def : Pat<(store (v4f16 FPR64:$Rt),		def : Pat<(store (v4f16 FPR64:$Rt),
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),		(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;
		def : Pat<(store (v4bf16 FPR64:$Rt),
		(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
		(STURDi FPR64:$Rt, GPR64sp:$Rn, simm9:$offset)>;
}		}

// Match all store 128 bits width whose type is compatible with FPR128		// Match all store 128 bits width whose type is compatible with FPR128
def : Pat<(store (f128 FPR128:$Rt), (am_unscaled128 GPR64sp:$Rn, simm9:$offset)),		def : Pat<(store (f128 FPR128:$Rt), (am_unscaled128 GPR64sp:$Rn, simm9:$offset)),
(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;

let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
// We must use ST1 to store vectors in big-endian.		// We must use ST1 to store vectors in big-endian.
Show All 16 Lines	def : Pat<(store (v2i64 FPR128:$Rt),
(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),		(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),
(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;
def : Pat<(store (v2f64 FPR128:$Rt),		def : Pat<(store (v2f64 FPR128:$Rt),
(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),		(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),
(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;
def : Pat<(store (v8f16 FPR128:$Rt),		def : Pat<(store (v8f16 FPR128:$Rt),
(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),		(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),
(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;		(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;
		def : Pat<(store (v8bf16 FPR128:$Rt),
		(am_unscaled128 GPR64sp:$Rn, simm9:$offset)),
		(STURQi FPR128:$Rt, GPR64sp:$Rn, simm9:$offset)>;
}		}

} // AddedComplexity = 10		} // AddedComplexity = 10

// unscaled i64 truncating stores		// unscaled i64 truncating stores
def : Pat<(truncstorei32 GPR64:$Rt, (am_unscaled32 GPR64sp:$Rn, simm9:$offset)),		def : Pat<(truncstorei32 GPR64:$Rt, (am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(STURWi (EXTRACT_SUBREG GPR64:$Rt, sub_32), GPR64sp:$Rn, simm9:$offset)>;		(STURWi (EXTRACT_SUBREG GPR64:$Rt, sub_32), GPR64sp:$Rn, simm9:$offset)>;
def : Pat<(truncstorei16 GPR64:$Rt, (am_unscaled16 GPR64sp:$Rn, simm9:$offset)),		def : Pat<(truncstorei16 GPR64:$Rt, (am_unscaled16 GPR64sp:$Rn, simm9:$offset)),
▲ Show 20 Lines • Show All 3,121 Lines • ▼ Show 20 Lines
def : Pat<(v2f64 (AArch64dup (f64 (load GPR64sp:$Rn)))),		def : Pat<(v2f64 (AArch64dup (f64 (load GPR64sp:$Rn)))),
(LD1Rv2d GPR64sp:$Rn)>;		(LD1Rv2d GPR64sp:$Rn)>;
def : Pat<(v1f64 (AArch64dup (f64 (load GPR64sp:$Rn)))),		def : Pat<(v1f64 (AArch64dup (f64 (load GPR64sp:$Rn)))),
(LD1Rv1d GPR64sp:$Rn)>;		(LD1Rv1d GPR64sp:$Rn)>;
def : Pat<(v4f16 (AArch64dup (f16 (load GPR64sp:$Rn)))),		def : Pat<(v4f16 (AArch64dup (f16 (load GPR64sp:$Rn)))),
(LD1Rv4h GPR64sp:$Rn)>;		(LD1Rv4h GPR64sp:$Rn)>;
def : Pat<(v8f16 (AArch64dup (f16 (load GPR64sp:$Rn)))),		def : Pat<(v8f16 (AArch64dup (f16 (load GPR64sp:$Rn)))),
(LD1Rv8h GPR64sp:$Rn)>;		(LD1Rv8h GPR64sp:$Rn)>;
		def : Pat<(v4bf16 (AArch64dup (bf16 (load GPR64sp:$Rn)))),
		(LD1Rv4h GPR64sp:$Rn)>;
		def : Pat<(v8bf16 (AArch64dup (bf16 (load GPR64sp:$Rn)))),
		(LD1Rv8h GPR64sp:$Rn)>;

class Ld1Lane128Pat<SDPatternOperator scalar_load, Operand VecIndex,		class Ld1Lane128Pat<SDPatternOperator scalar_load, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction LD1>		ValueType VTy, ValueType STy, Instruction LD1>
: Pat<(vector_insert (VTy VecListOne128:$Rd),		: Pat<(vector_insert (VTy VecListOne128:$Rd),
(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),		(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),
(LD1 VecListOne128:$Rd, VecIndex:$idx, GPR64sp:$Rn)>;		(LD1 VecListOne128:$Rd, VecIndex:$idx, GPR64sp:$Rn)>;

def : Ld1Lane128Pat<extloadi8, VectorIndexB, v16i8, i32, LD1i8>;		def : Ld1Lane128Pat<extloadi8, VectorIndexB, v16i8, i32, LD1i8>;
def : Ld1Lane128Pat<extloadi16, VectorIndexH, v8i16, i32, LD1i16>;		def : Ld1Lane128Pat<extloadi16, VectorIndexH, v8i16, i32, LD1i16>;
def : Ld1Lane128Pat<load, VectorIndexS, v4i32, i32, LD1i32>;		def : Ld1Lane128Pat<load, VectorIndexS, v4i32, i32, LD1i32>;
def : Ld1Lane128Pat<load, VectorIndexS, v4f32, f32, LD1i32>;		def : Ld1Lane128Pat<load, VectorIndexS, v4f32, f32, LD1i32>;
def : Ld1Lane128Pat<load, VectorIndexD, v2i64, i64, LD1i64>;		def : Ld1Lane128Pat<load, VectorIndexD, v2i64, i64, LD1i64>;
def : Ld1Lane128Pat<load, VectorIndexD, v2f64, f64, LD1i64>;		def : Ld1Lane128Pat<load, VectorIndexD, v2f64, f64, LD1i64>;
def : Ld1Lane128Pat<load, VectorIndexH, v8f16, f16, LD1i16>;		def : Ld1Lane128Pat<load, VectorIndexH, v8f16, f16, LD1i16>;
		def : Ld1Lane128Pat<load, VectorIndexH, v8bf16, bf16, LD1i16>;

class Ld1Lane64Pat<SDPatternOperator scalar_load, Operand VecIndex,		class Ld1Lane64Pat<SDPatternOperator scalar_load, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction LD1>		ValueType VTy, ValueType STy, Instruction LD1>
: Pat<(vector_insert (VTy VecListOne64:$Rd),		: Pat<(vector_insert (VTy VecListOne64:$Rd),
(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),		(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(LD1 (SUBREG_TO_REG (i32 0), VecListOne64:$Rd, dsub),		(LD1 (SUBREG_TO_REG (i32 0), VecListOne64:$Rd, dsub),
VecIndex:$idx, GPR64sp:$Rn),		VecIndex:$idx, GPR64sp:$Rn),
dsub)>;		dsub)>;

def : Ld1Lane64Pat<extloadi8, VectorIndexB, v8i8, i32, LD1i8>;		def : Ld1Lane64Pat<extloadi8, VectorIndexB, v8i8, i32, LD1i8>;
def : Ld1Lane64Pat<extloadi16, VectorIndexH, v4i16, i32, LD1i16>;		def : Ld1Lane64Pat<extloadi16, VectorIndexH, v4i16, i32, LD1i16>;
def : Ld1Lane64Pat<load, VectorIndexS, v2i32, i32, LD1i32>;		def : Ld1Lane64Pat<load, VectorIndexS, v2i32, i32, LD1i32>;
def : Ld1Lane64Pat<load, VectorIndexS, v2f32, f32, LD1i32>;		def : Ld1Lane64Pat<load, VectorIndexS, v2f32, f32, LD1i32>;
def : Ld1Lane64Pat<load, VectorIndexH, v4f16, f16, LD1i16>;		def : Ld1Lane64Pat<load, VectorIndexH, v4f16, f16, LD1i16>;
		def : Ld1Lane64Pat<load, VectorIndexH, v4bf16, bf16, LD1i16>;


defm LD1 : SIMDLdSt1SingleAliases<"ld1">;		defm LD1 : SIMDLdSt1SingleAliases<"ld1">;
defm LD2 : SIMDLdSt2SingleAliases<"ld2">;		defm LD2 : SIMDLdSt2SingleAliases<"ld2">;
defm LD3 : SIMDLdSt3SingleAliases<"ld3">;		defm LD3 : SIMDLdSt3SingleAliases<"ld3">;
defm LD4 : SIMDLdSt4SingleAliases<"ld4">;		defm LD4 : SIMDLdSt4SingleAliases<"ld4">;

// Stores		// Stores
Show All 12 Lines

def : St1Lane128Pat<truncstorei8, VectorIndexB, v16i8, i32, ST1i8>;		def : St1Lane128Pat<truncstorei8, VectorIndexB, v16i8, i32, ST1i8>;
def : St1Lane128Pat<truncstorei16, VectorIndexH, v8i16, i32, ST1i16>;		def : St1Lane128Pat<truncstorei16, VectorIndexH, v8i16, i32, ST1i16>;
def : St1Lane128Pat<store, VectorIndexS, v4i32, i32, ST1i32>;		def : St1Lane128Pat<store, VectorIndexS, v4i32, i32, ST1i32>;
def : St1Lane128Pat<store, VectorIndexS, v4f32, f32, ST1i32>;		def : St1Lane128Pat<store, VectorIndexS, v4f32, f32, ST1i32>;
def : St1Lane128Pat<store, VectorIndexD, v2i64, i64, ST1i64>;		def : St1Lane128Pat<store, VectorIndexD, v2i64, i64, ST1i64>;
def : St1Lane128Pat<store, VectorIndexD, v2f64, f64, ST1i64>;		def : St1Lane128Pat<store, VectorIndexD, v2f64, f64, ST1i64>;
def : St1Lane128Pat<store, VectorIndexH, v8f16, f16, ST1i16>;		def : St1Lane128Pat<store, VectorIndexH, v8f16, f16, ST1i16>;
		def : St1Lane128Pat<store, VectorIndexH, v8bf16, bf16, ST1i16>;

let AddedComplexity = 19 in		let AddedComplexity = 19 in
class St1Lane64Pat<SDPatternOperator scalar_store, Operand VecIndex,		class St1Lane64Pat<SDPatternOperator scalar_store, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction ST1>		ValueType VTy, ValueType STy, Instruction ST1>
: Pat<(scalar_store		: Pat<(scalar_store
(STy (vector_extract (VTy VecListOne64:$Vt), VecIndex:$idx)),		(STy (vector_extract (VTy VecListOne64:$Vt), VecIndex:$idx)),
GPR64sp:$Rn),		GPR64sp:$Rn),
(ST1 (SUBREG_TO_REG (i32 0), VecListOne64:$Vt, dsub),		(ST1 (SUBREG_TO_REG (i32 0), VecListOne64:$Vt, dsub),
VecIndex:$idx, GPR64sp:$Rn)>;		VecIndex:$idx, GPR64sp:$Rn)>;

def : St1Lane64Pat<truncstorei8, VectorIndexB, v8i8, i32, ST1i8>;		def : St1Lane64Pat<truncstorei8, VectorIndexB, v8i8, i32, ST1i8>;
def : St1Lane64Pat<truncstorei16, VectorIndexH, v4i16, i32, ST1i16>;		def : St1Lane64Pat<truncstorei16, VectorIndexH, v4i16, i32, ST1i16>;
def : St1Lane64Pat<store, VectorIndexS, v2i32, i32, ST1i32>;		def : St1Lane64Pat<store, VectorIndexS, v2i32, i32, ST1i32>;
def : St1Lane64Pat<store, VectorIndexS, v2f32, f32, ST1i32>;		def : St1Lane64Pat<store, VectorIndexS, v2f32, f32, ST1i32>;
def : St1Lane64Pat<store, VectorIndexH, v4f16, f16, ST1i16>;		def : St1Lane64Pat<store, VectorIndexH, v4f16, f16, ST1i16>;
		def : St1Lane64Pat<store, VectorIndexH, v4bf16, bf16, ST1i16>;

multiclass St1LanePost64Pat<SDPatternOperator scalar_store, Operand VecIndex,		multiclass St1LanePost64Pat<SDPatternOperator scalar_store, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction ST1,		ValueType VTy, ValueType STy, Instruction ST1,
int offset> {		int offset> {
def : Pat<(scalar_store		def : Pat<(scalar_store
(STy (vector_extract (VTy VecListOne64:$Vt), VecIndex:$idx)),		(STy (vector_extract (VTy VecListOne64:$Vt), VecIndex:$idx)),
GPR64sp:$Rn, offset),		GPR64sp:$Rn, offset),
(ST1 (SUBREG_TO_REG (i32 0), VecListOne64:$Vt, dsub),		(ST1 (SUBREG_TO_REG (i32 0), VecListOne64:$Vt, dsub),
Show All 9 Lines
defm : St1LanePost64Pat<post_truncsti8, VectorIndexB, v8i8, i32, ST1i8_POST, 1>;		defm : St1LanePost64Pat<post_truncsti8, VectorIndexB, v8i8, i32, ST1i8_POST, 1>;
defm : St1LanePost64Pat<post_truncsti16, VectorIndexH, v4i16, i32, ST1i16_POST,		defm : St1LanePost64Pat<post_truncsti16, VectorIndexH, v4i16, i32, ST1i16_POST,
2>;		2>;
defm : St1LanePost64Pat<post_store, VectorIndexS, v2i32, i32, ST1i32_POST, 4>;		defm : St1LanePost64Pat<post_store, VectorIndexS, v2i32, i32, ST1i32_POST, 4>;
defm : St1LanePost64Pat<post_store, VectorIndexS, v2f32, f32, ST1i32_POST, 4>;		defm : St1LanePost64Pat<post_store, VectorIndexS, v2f32, f32, ST1i32_POST, 4>;
defm : St1LanePost64Pat<post_store, VectorIndexD, v1i64, i64, ST1i64_POST, 8>;		defm : St1LanePost64Pat<post_store, VectorIndexD, v1i64, i64, ST1i64_POST, 8>;
defm : St1LanePost64Pat<post_store, VectorIndexD, v1f64, f64, ST1i64_POST, 8>;		defm : St1LanePost64Pat<post_store, VectorIndexD, v1f64, f64, ST1i64_POST, 8>;
defm : St1LanePost64Pat<post_store, VectorIndexH, v4f16, f16, ST1i16_POST, 2>;		defm : St1LanePost64Pat<post_store, VectorIndexH, v4f16, f16, ST1i16_POST, 2>;
		defm : St1LanePost64Pat<post_store, VectorIndexH, v4bf16, bf16, ST1i16_POST, 2>;

multiclass St1LanePost128Pat<SDPatternOperator scalar_store, Operand VecIndex,		multiclass St1LanePost128Pat<SDPatternOperator scalar_store, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction ST1,		ValueType VTy, ValueType STy, Instruction ST1,
int offset> {		int offset> {
def : Pat<(scalar_store		def : Pat<(scalar_store
(STy (vector_extract (VTy VecListOne128:$Vt), VecIndex:$idx)),		(STy (vector_extract (VTy VecListOne128:$Vt), VecIndex:$idx)),
GPR64sp:$Rn, offset),		GPR64sp:$Rn, offset),
(ST1 VecListOne128:$Vt, VecIndex:$idx, GPR64sp:$Rn, XZR)>;		(ST1 VecListOne128:$Vt, VecIndex:$idx, GPR64sp:$Rn, XZR)>;

def : Pat<(scalar_store		def : Pat<(scalar_store
(STy (vector_extract (VTy VecListOne128:$Vt), VecIndex:$idx)),		(STy (vector_extract (VTy VecListOne128:$Vt), VecIndex:$idx)),
GPR64sp:$Rn, GPR64:$Rm),		GPR64sp:$Rn, GPR64:$Rm),
(ST1 VecListOne128:$Vt, VecIndex:$idx, GPR64sp:$Rn, $Rm)>;		(ST1 VecListOne128:$Vt, VecIndex:$idx, GPR64sp:$Rn, $Rm)>;
}		}

defm : St1LanePost128Pat<post_truncsti8, VectorIndexB, v16i8, i32, ST1i8_POST,		defm : St1LanePost128Pat<post_truncsti8, VectorIndexB, v16i8, i32, ST1i8_POST,
1>;		1>;
defm : St1LanePost128Pat<post_truncsti16, VectorIndexH, v8i16, i32, ST1i16_POST,		defm : St1LanePost128Pat<post_truncsti16, VectorIndexH, v8i16, i32, ST1i16_POST,
2>;		2>;
defm : St1LanePost128Pat<post_store, VectorIndexS, v4i32, i32, ST1i32_POST, 4>;		defm : St1LanePost128Pat<post_store, VectorIndexS, v4i32, i32, ST1i32_POST, 4>;
defm : St1LanePost128Pat<post_store, VectorIndexS, v4f32, f32, ST1i32_POST, 4>;		defm : St1LanePost128Pat<post_store, VectorIndexS, v4f32, f32, ST1i32_POST, 4>;
defm : St1LanePost128Pat<post_store, VectorIndexD, v2i64, i64, ST1i64_POST, 8>;		defm : St1LanePost128Pat<post_store, VectorIndexD, v2i64, i64, ST1i64_POST, 8>;
defm : St1LanePost128Pat<post_store, VectorIndexD, v2f64, f64, ST1i64_POST, 8>;		defm : St1LanePost128Pat<post_store, VectorIndexD, v2f64, f64, ST1i64_POST, 8>;
defm : St1LanePost128Pat<post_store, VectorIndexH, v8f16, f16, ST1i16_POST, 2>;		defm : St1LanePost128Pat<post_store, VectorIndexH, v8f16, f16, ST1i16_POST, 2>;
		defm : St1LanePost128Pat<post_store, VectorIndexH, v8bf16, bf16, ST1i16_POST, 2>;

let mayStore = 1, hasSideEffects = 0 in {		let mayStore = 1, hasSideEffects = 0 in {
defm ST2 : SIMDStSingleB<1, 0b000, "st2", VecListTwob, GPR64pi2>;		defm ST2 : SIMDStSingleB<1, 0b000, "st2", VecListTwob, GPR64pi2>;
defm ST2 : SIMDStSingleH<1, 0b010, 0, "st2", VecListTwoh, GPR64pi4>;		defm ST2 : SIMDStSingleH<1, 0b010, 0, "st2", VecListTwoh, GPR64pi4>;
defm ST2 : SIMDStSingleS<1, 0b100, 0b00, "st2", VecListTwos, GPR64pi8>;		defm ST2 : SIMDStSingleS<1, 0b100, 0b00, "st2", VecListTwos, GPR64pi8>;
defm ST2 : SIMDStSingleD<1, 0b100, 0b01, "st2", VecListTwod, GPR64pi16>;		defm ST2 : SIMDStSingleD<1, 0b100, 0b01, "st2", VecListTwod, GPR64pi16>;
defm ST3 : SIMDStSingleB<0, 0b001, "st3", VecListThreeb, GPR64pi3>;		defm ST3 : SIMDStSingleB<0, 0b001, "st3", VecListThreeb, GPR64pi3>;
defm ST3 : SIMDStSingleH<0, 0b011, 0, "st3", VecListThreeh, GPR64pi6>;		defm ST3 : SIMDStSingleH<0, 0b011, 0, "st3", VecListThreeh, GPR64pi6>;
▲ Show 20 Lines • Show All 1,145 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll

This file was added.

				; RUN: llc -mtriple aarch64-arm-none-eabi -mattr=+bf16 %s -o - \| FileCheck %s

				%struct.bfloat16x4x2_t = type { [2 x <4 x bfloat>] }
				%struct.bfloat16x8x2_t = type { [2 x <8 x bfloat>] }
				%struct.bfloat16x4x3_t = type { [3 x <4 x bfloat>] }
				%struct.bfloat16x8x3_t = type { [3 x <8 x bfloat>] }
				%struct.bfloat16x4x4_t = type { [4 x <4 x bfloat>] }
				%struct.bfloat16x8x4_t = type { [4 x <8 x bfloat>] }

				; CHECK-LABEL: test_vld1_bf16
				; CHECK: ldr d0, [x0]
				define <4 x bfloat> @test_vld1_bf16(bfloat* nocapture readonly %ptr) local_unnamed_addr #0 {
				entry:
				%0 = bitcast bfloat* %ptr to <4 x bfloat>*
				%1 = load <4 x bfloat>, <4 x bfloat>* %0, align 2
				ret <4 x bfloat> %1
				}

				; CHECK-LABEL: test_vld1q_bf16
				; CHECK: ldr q0, [x0]
				define <8 x bfloat> @test_vld1q_bf16(bfloat* nocapture readonly %ptr) local_unnamed_addr #1 {
				entry:
				%0 = bitcast bfloat* %ptr to <8 x bfloat>*
				%1 = load <8 x bfloat>, <8 x bfloat>* %0, align 2
				ret <8 x bfloat> %1
				}

				; CHECK-LABEL: test_vld1_lane_bf16
				; CHECK: ld1 { v0.h }[0], [x0]
				define <4 x bfloat> @test_vld1_lane_bf16(bfloat* nocapture readonly %ptr, <4 x bfloat> %src) local_unnamed_addr #0 {
				entry:
				%0 = load bfloat, bfloat* %ptr, align 2
				%vld1_lane = insertelement <4 x bfloat> %src, bfloat %0, i32 0
				ret <4 x bfloat> %vld1_lane
				}

				; CHECK-LABEL: test_vld1q_lane_bf16
				; CHECK: ld1 { v0.h }[7], [x0]
				define <8 x bfloat> @test_vld1q_lane_bf16(bfloat* nocapture readonly %ptr, <8 x bfloat> %src) local_unnamed_addr #1 {
				entry:
				%0 = load bfloat, bfloat* %ptr, align 2
				%vld1_lane = insertelement <8 x bfloat> %src, bfloat %0, i32 7
				ret <8 x bfloat> %vld1_lane
				}

				; CHECK-LABEL: test_vld1_dup_bf16
				; CHECK: ld1r { v0.4h }, [x0]
				define <4 x bfloat> @test_vld1_dup_bf16(bfloat* nocapture readonly %ptr) local_unnamed_addr #0 {
				entry:
				%0 = load bfloat, bfloat* %ptr, align 2
				%1 = insertelement <4 x bfloat> undef, bfloat %0, i32 0
				%lane = shufflevector <4 x bfloat> %1, <4 x bfloat> undef, <4 x i32> zeroinitializer
				ret <4 x bfloat> %lane
				}

				; CHECK-LABEL: test_vld1_bf16_x2
				; CHECK: ld1 { v0.4h, v1.4h }, [x0]
				define %struct.bfloat16x4x2_t @test_vld1_bf16_x2(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld1xN = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x2.v4bf16.p0bf16(bfloat* %ptr)
				%vld1xN.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld1xN, 0
				%vld1xN.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld1xN, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x2_t undef, <4 x bfloat> %vld1xN.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x2_t %.fca.0.0.insert, <4 x bfloat> %vld1xN.fca.1.extract, 0, 1
				ret %struct.bfloat16x4x2_t %.fca.0.1.insert
				}

				declare { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x2.v4bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld1q_bf16_x2
				; CHECK: ld1 { v0.8h, v1.8h }, [x0]
				define %struct.bfloat16x8x2_t @test_vld1q_bf16_x2(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld1xN = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x2.v8bf16.p0bf16(bfloat* %ptr)
				%vld1xN.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld1xN, 0
				%vld1xN.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld1xN, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x2_t undef, <8 x bfloat> %vld1xN.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x2_t %.fca.0.0.insert, <8 x bfloat> %vld1xN.fca.1.extract, 0, 1
				ret %struct.bfloat16x8x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x2.v8bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld1_bf16_x3
				; CHECK: ld1 { v0.4h, v1.4h, v2.4h }, [x0]
				define %struct.bfloat16x4x3_t @test_vld1_bf16_x3(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld1xN = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x3.v4bf16.p0bf16(bfloat* %ptr)
				%vld1xN.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 0
				%vld1xN.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 1
				%vld1xN.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x3_t undef, <4 x bfloat> %vld1xN.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.0.insert, <4 x bfloat> %vld1xN.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.1.insert, <4 x bfloat> %vld1xN.fca.2.extract, 0, 2
				ret %struct.bfloat16x4x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x3.v4bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld1q_bf16_x3
				; CHECK: ld1 { v0.8h, v1.8h, v2.8h }, [x0]
				define %struct.bfloat16x8x3_t @test_vld1q_bf16_x3(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld1xN = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x3.v8bf16.p0bf16(bfloat* %ptr)
				%vld1xN.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 0
				%vld1xN.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 1
				%vld1xN.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x3_t undef, <8 x bfloat> %vld1xN.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.0.insert, <8 x bfloat> %vld1xN.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.1.insert, <8 x bfloat> %vld1xN.fca.2.extract, 0, 2
				ret %struct.bfloat16x8x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x3.v8bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld1_bf16_x4
				; CHECK: ld1 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
				define %struct.bfloat16x4x4_t @test_vld1_bf16_x4(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld1xN = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x4.v4bf16.p0bf16(bfloat* %ptr)
				%vld1xN.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 0
				%vld1xN.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 1
				%vld1xN.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 2
				%vld1xN.fca.3.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld1xN, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x4_t undef, <4 x bfloat> %vld1xN.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.0.insert, <4 x bfloat> %vld1xN.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.1.insert, <4 x bfloat> %vld1xN.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.2.insert, <4 x bfloat> %vld1xN.fca.3.extract, 0, 3
				ret %struct.bfloat16x4x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld1x4.v4bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld1q_bf16_x4
				; CHECK: ld1 { v0.8h, v1.8h, v2.8h, v3.8h }, [x0]
				define %struct.bfloat16x8x4_t @test_vld1q_bf16_x4(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld1xN = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x4.v8bf16.p0bf16(bfloat* %ptr)
				%vld1xN.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 0
				%vld1xN.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 1
				%vld1xN.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 2
				%vld1xN.fca.3.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld1xN, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x4_t undef, <8 x bfloat> %vld1xN.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.0.insert, <8 x bfloat> %vld1xN.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.1.insert, <8 x bfloat> %vld1xN.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.2.insert, <8 x bfloat> %vld1xN.fca.3.extract, 0, 3
				ret %struct.bfloat16x8x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld1x4.v8bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld1q_dup_bf16
				; CHECK: ld1r { v0.8h }, [x0]
				define <8 x bfloat> @test_vld1q_dup_bf16(bfloat* nocapture readonly %ptr) local_unnamed_addr #1 {
				entry:
				%0 = load bfloat, bfloat* %ptr, align 2
				%1 = insertelement <8 x bfloat> undef, bfloat %0, i32 0
				%lane = shufflevector <8 x bfloat> %1, <8 x bfloat> undef, <8 x i32> zeroinitializer
				ret <8 x bfloat> %lane
				}

				; CHECK-LABEL: test_vld2_bf16
				; CHECK: ld2 { v0.4h, v1.4h }, [x0]
				define %struct.bfloat16x4x2_t @test_vld2_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%0 = bitcast bfloat* %ptr to <4 x bfloat>*
				%vld2 = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2.v4bf16.p0v4bf16(<4 x bfloat>* %0)
				%vld2.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld2, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x2_t undef, <4 x bfloat> %vld2.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x2_t %.fca.0.0.insert, <4 x bfloat> %vld2.fca.1.extract, 0, 1
				ret %struct.bfloat16x4x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2.v4bf16.p0v4bf16(<4 x bfloat>*) #3

				; CHECK-LABEL: test_vld2q_bf16
				; CHECK: ld2 { v0.8h, v1.8h }, [x0]
				define %struct.bfloat16x8x2_t @test_vld2q_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%0 = bitcast bfloat* %ptr to <8 x bfloat>*
				%vld2 = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2.v8bf16.p0v8bf16(<8 x bfloat>* %0)
				%vld2.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld2, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x2_t undef, <8 x bfloat> %vld2.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x2_t %.fca.0.0.insert, <8 x bfloat> %vld2.fca.1.extract, 0, 1
				ret %struct.bfloat16x8x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2.v8bf16.p0v8bf16(<8 x bfloat>*) #3

				; CHECK-LABEL: test_vld2_lane_bf16
				; CHECK: ld2 { v0.h, v1.h }[1], [x0]
				define %struct.bfloat16x4x2_t @test_vld2_lane_bf16(bfloat* %ptr, [2 x <4 x bfloat>] %src.coerce) local_unnamed_addr #2 {
				entry:
				%src.coerce.fca.0.extract = extractvalue [2 x <4 x bfloat>] %src.coerce, 0
				%src.coerce.fca.1.extract = extractvalue [2 x <4 x bfloat>] %src.coerce, 1
				%0 = bitcast bfloat* %ptr to i8*
				%vld2_lane = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2lane.v4bf16.p0i8(<4 x bfloat> %src.coerce.fca.0.extract, <4 x bfloat> %src.coerce.fca.1.extract, i64 1, i8* %0)
				%vld2_lane.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld2_lane, 0
				%vld2_lane.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld2_lane, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x2_t undef, <4 x bfloat> %vld2_lane.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x2_t %.fca.0.0.insert, <4 x bfloat> %vld2_lane.fca.1.extract, 0, 1
				ret %struct.bfloat16x4x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2lane.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, i64, i8*) #3

				; CHECK-LABEL: test_vld2q_lane_bf16
				; CHECK: ld2 { v0.h, v1.h }[7], [x0]
				define %struct.bfloat16x8x2_t @test_vld2q_lane_bf16(bfloat* %ptr, [2 x <8 x bfloat>] %src.coerce) local_unnamed_addr #2 {
				entry:
				%src.coerce.fca.0.extract = extractvalue [2 x <8 x bfloat>] %src.coerce, 0
				%src.coerce.fca.1.extract = extractvalue [2 x <8 x bfloat>] %src.coerce, 1
				%0 = bitcast bfloat* %ptr to i8*
				%vld2_lane = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2lane.v8bf16.p0i8(<8 x bfloat> %src.coerce.fca.0.extract, <8 x bfloat> %src.coerce.fca.1.extract, i64 7, i8* %0)
				%vld2_lane.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld2_lane, 0
				%vld2_lane.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld2_lane, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x2_t undef, <8 x bfloat> %vld2_lane.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x2_t %.fca.0.0.insert, <8 x bfloat> %vld2_lane.fca.1.extract, 0, 1
				ret %struct.bfloat16x8x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2lane.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, i64, i8*) #3

				; CHECK-LABEL: test_vld3_bf16
				; CHECK: ld3 { v0.4h, v1.4h, v2.4h }, [x0]
				define %struct.bfloat16x4x3_t @test_vld3_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%0 = bitcast bfloat* %ptr to <4 x bfloat>*
				%vld3 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3.v4bf16.p0v4bf16(<4 x bfloat>* %0)
				%vld3.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3, 0
				%vld3.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3, 1
				%vld3.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x3_t undef, <4 x bfloat> %vld3.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.0.insert, <4 x bfloat> %vld3.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.1.insert, <4 x bfloat> %vld3.fca.2.extract, 0, 2
				ret %struct.bfloat16x4x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3.v4bf16.p0v4bf16(<4 x bfloat>*) #3

				; CHECK-LABEL: test_vld3q_bf16
				; CHECK: ld3 { v0.8h, v1.8h, v2.8h }, [x0]
				define %struct.bfloat16x8x3_t @test_vld3q_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%0 = bitcast bfloat* %ptr to <8 x bfloat>*
				%vld3 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3.v8bf16.p0v8bf16(<8 x bfloat>* %0)
				%vld3.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3, 0
				%vld3.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3, 1
				%vld3.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x3_t undef, <8 x bfloat> %vld3.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.0.insert, <8 x bfloat> %vld3.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.1.insert, <8 x bfloat> %vld3.fca.2.extract, 0, 2
				ret %struct.bfloat16x8x3_t %.fca.0.2.insert
				arsenmUnsubmitted Done Reply Inline Actions Why is the IR type name bfloat and not bfloat16? arsenm: Why is the IR type name bfloat and not bfloat16?
				LukeGeesonAuthorUnsubmitted Done Reply Inline Actions The naming for the IR type was agreed upon here after quite a big discussion. https://reviews.llvm.org/D78190 LukeGeeson: The naming for the IR type was agreed upon here after quite a big discussion. https://reviews.
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I regret very much that I didn't notice this earlier... I.e., I noticed this in D76077 and wrote that I am relatively unhappy about this (I think I mentioned this on another ticket too). Because like @arsenm , I would expect the IR type name to be bfloat16. Correct me if I am wrong, but I don't see a big discussion about this in D78190. I only see 1 or 2 comments about `BFloat` vs `Bfloat`. SjoerdMeijer: I regret very much that I didn't notice this earlier... I.e., I noticed this in D76077 and…
				LukeGeesonAuthorUnsubmitted Done Reply Inline Actions I cannot see a discussion about the IR type name per-se but I can see you were both involved in the discussion more generally. I am concerned that this patch is the wrong place to discuss such issues, and that we should bring this up in a more appropriate place as you mention so that this patch isn't held back. LukeGeeson: I cannot see a discussion about the IR type name per-se but I can see you were both involved in…
				chillUnsubmitted Not Done Reply Inline Actions I don't see a compelling reason for the name to be `bfloat16` or `bfloat3`, etc. Like other floating-point types (`float`, `double`, and `half`), the name denotes a specific externally defined format, unlike `iN`. chill: I don't see a compelling reason for the name to be `bfloat16` or `bfloat3`, etc. Like other…
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Like other floating-point types (float, double, and half), the name denotes a specific externally defined format, Is the defined format not called bfloat16? SjoerdMeijer: > Like other floating-point types (float, double, and half), the name denotes a specific…
				chillUnsubmitted Not Done Reply Inline Actions Indeed, people use the name "bfloat16". But then the `half`, `float`, and `double` also differ from the official `binary16`, `binarty32`, and `binary64`. IMHO `bfloat` fits better in the LLVM IR naming convention. chill: Indeed, people use the name "bfloat16". But then the `half`, `float`, and `double` also differ…
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions yeah, so that's exactly why I don't follow your logic. If there's any logic in the names here, the mapping from source-language type to IR type seems the most plausible one. And I just don't see the benefit of dropping the 16, and how that would fit better in some naming scheme or how that makes things clearer here. SjoerdMeijer: yeah, so that's exactly why I don't follow your logic. If there's any logic in the names here…
				chillUnsubmitted Not Done Reply Inline Actions What source language? That said, I'm resigning from the bikeshedding here. chill: What source language? That said, I'm resigning from the bikeshedding here.
				stuijUnsubmitted Not Done Reply Inline Actions Just as a house-keeping note: If we would change the naming, I think we can all agree that this ticket itself shouldn't be the place where we want to do this. I'm happy for the conversation to carry on here, but I think we can move the ticket forward at the same time. stuij: Just as a house-keeping note: If we would change the naming, I think we can all agree that this…
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3.v8bf16.p0v8bf16(<8 x bfloat>*) #3

				; CHECK-LABEL: test_vld3_lane_bf16
				; CHECK: ld3 { v0.h, v1.h, v2.h }[1], [x0]
				define %struct.bfloat16x4x3_t @test_vld3_lane_bf16(bfloat* %ptr, [3 x <4 x bfloat>] %src.coerce) local_unnamed_addr #2 {
				entry:
				%src.coerce.fca.0.extract = extractvalue [3 x <4 x bfloat>] %src.coerce, 0
				%src.coerce.fca.1.extract = extractvalue [3 x <4 x bfloat>] %src.coerce, 1
				%src.coerce.fca.2.extract = extractvalue [3 x <4 x bfloat>] %src.coerce, 2
				%0 = bitcast bfloat* %ptr to i8*
				%vld3_lane = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3lane.v4bf16.p0i8(<4 x bfloat> %src.coerce.fca.0.extract, <4 x bfloat> %src.coerce.fca.1.extract, <4 x bfloat> %src.coerce.fca.2.extract, i64 1, i8* %0)
				%vld3_lane.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3_lane, 0
				%vld3_lane.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3_lane, 1
				%vld3_lane.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3_lane, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x3_t undef, <4 x bfloat> %vld3_lane.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.0.insert, <4 x bfloat> %vld3_lane.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.1.insert, <4 x bfloat> %vld3_lane.fca.2.extract, 0, 2
				ret %struct.bfloat16x4x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3lane.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, i64, i8*) #3

				; CHECK-LABEL: test_vld3q_lane_bf16
				; CHECK: ld3 { v0.h, v1.h, v2.h }[7], [x0]
				define %struct.bfloat16x8x3_t @test_vld3q_lane_bf16(bfloat* %ptr, [3 x <8 x bfloat>] %src.coerce) local_unnamed_addr #2 {
				entry:
				%src.coerce.fca.0.extract = extractvalue [3 x <8 x bfloat>] %src.coerce, 0
				%src.coerce.fca.1.extract = extractvalue [3 x <8 x bfloat>] %src.coerce, 1
				%src.coerce.fca.2.extract = extractvalue [3 x <8 x bfloat>] %src.coerce, 2
				%0 = bitcast bfloat* %ptr to i8*
				%vld3_lane = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3lane.v8bf16.p0i8(<8 x bfloat> %src.coerce.fca.0.extract, <8 x bfloat> %src.coerce.fca.1.extract, <8 x bfloat> %src.coerce.fca.2.extract, i64 7, i8* %0)
				%vld3_lane.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3_lane, 0
				%vld3_lane.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3_lane, 1
				%vld3_lane.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3_lane, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x3_t undef, <8 x bfloat> %vld3_lane.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.0.insert, <8 x bfloat> %vld3_lane.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.1.insert, <8 x bfloat> %vld3_lane.fca.2.extract, 0, 2
				ret %struct.bfloat16x8x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3lane.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i64, i8*) #3

				; CHECK-LABEL: test_vld4_bf16
				; CHECK: ld4 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
				define %struct.bfloat16x4x4_t @test_vld4_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%0 = bitcast bfloat* %ptr to <4 x bfloat>*
				%vld4 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4.v4bf16.p0v4bf16(<4 x bfloat>* %0)
				%vld4.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 0
				%vld4.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 1
				%vld4.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 2
				%vld4.fca.3.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x4_t undef, <4 x bfloat> %vld4.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.0.insert, <4 x bfloat> %vld4.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.1.insert, <4 x bfloat> %vld4.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.2.insert, <4 x bfloat> %vld4.fca.3.extract, 0, 3
				ret %struct.bfloat16x4x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4.v4bf16.p0v4bf16(<4 x bfloat>*) #3

				; CHECK-LABEL: test_vld4q_bf16
				; CHECK: ld4 { v0.8h, v1.8h, v2.8h, v3.8h }, [x0]
				define %struct.bfloat16x8x4_t @test_vld4q_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%0 = bitcast bfloat* %ptr to <8 x bfloat>*
				%vld4 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4.v8bf16.p0v8bf16(<8 x bfloat>* %0)
				%vld4.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 0
				%vld4.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 1
				%vld4.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 2
				%vld4.fca.3.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x4_t undef, <8 x bfloat> %vld4.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.0.insert, <8 x bfloat> %vld4.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.1.insert, <8 x bfloat> %vld4.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.2.insert, <8 x bfloat> %vld4.fca.3.extract, 0, 3
				ret %struct.bfloat16x8x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4.v8bf16.p0v8bf16(<8 x bfloat>*) #3

				; CHECK-LABEL: test_vld4_lane_bf16
				; CHECK: ld4 { v0.h, v1.h, v2.h, v3.h }[1], [x0]
				define %struct.bfloat16x4x4_t @test_vld4_lane_bf16(bfloat* %ptr, [4 x <4 x bfloat>] %src.coerce) local_unnamed_addr #2 {
				entry:
				%src.coerce.fca.0.extract = extractvalue [4 x <4 x bfloat>] %src.coerce, 0
				%src.coerce.fca.1.extract = extractvalue [4 x <4 x bfloat>] %src.coerce, 1
				%src.coerce.fca.2.extract = extractvalue [4 x <4 x bfloat>] %src.coerce, 2
				%src.coerce.fca.3.extract = extractvalue [4 x <4 x bfloat>] %src.coerce, 3
				%0 = bitcast bfloat* %ptr to i8*
				%vld4_lane = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4lane.v4bf16.p0i8(<4 x bfloat> %src.coerce.fca.0.extract, <4 x bfloat> %src.coerce.fca.1.extract, <4 x bfloat> %src.coerce.fca.2.extract, <4 x bfloat> %src.coerce.fca.3.extract, i64 1, i8* %0)
				%vld4_lane.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4_lane, 0
				%vld4_lane.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4_lane, 1
				%vld4_lane.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4_lane, 2
				%vld4_lane.fca.3.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4_lane, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x4_t undef, <4 x bfloat> %vld4_lane.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.0.insert, <4 x bfloat> %vld4_lane.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.1.insert, <4 x bfloat> %vld4_lane.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.2.insert, <4 x bfloat> %vld4_lane.fca.3.extract, 0, 3
				ret %struct.bfloat16x4x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4lane.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, i64, i8*) #3

				; CHECK-LABEL: test_vld4q_lane_bf16
				; CHECK: ld4 { v0.h, v1.h, v2.h, v3.h }[7], [x0]
				define %struct.bfloat16x8x4_t @test_vld4q_lane_bf16(bfloat* %ptr, [4 x <8 x bfloat>] %src.coerce) local_unnamed_addr #2 {
				entry:
				%src.coerce.fca.0.extract = extractvalue [4 x <8 x bfloat>] %src.coerce, 0
				%src.coerce.fca.1.extract = extractvalue [4 x <8 x bfloat>] %src.coerce, 1
				%src.coerce.fca.2.extract = extractvalue [4 x <8 x bfloat>] %src.coerce, 2
				%src.coerce.fca.3.extract = extractvalue [4 x <8 x bfloat>] %src.coerce, 3
				%0 = bitcast bfloat* %ptr to i8*
				%vld4_lane = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4lane.v8bf16.p0i8(<8 x bfloat> %src.coerce.fca.0.extract, <8 x bfloat> %src.coerce.fca.1.extract, <8 x bfloat> %src.coerce.fca.2.extract, <8 x bfloat> %src.coerce.fca.3.extract, i64 7, i8* %0)
				%vld4_lane.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4_lane, 0
				%vld4_lane.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4_lane, 1
				%vld4_lane.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4_lane, 2
				%vld4_lane.fca.3.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4_lane, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x4_t undef, <8 x bfloat> %vld4_lane.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.0.insert, <8 x bfloat> %vld4_lane.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.1.insert, <8 x bfloat> %vld4_lane.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.2.insert, <8 x bfloat> %vld4_lane.fca.3.extract, 0, 3
				ret %struct.bfloat16x8x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4lane.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i64, i8*) #3

				; CHECK-LABEL: test_vld2_dup_bf16
				; CHECK: ld2r { v0.4h, v1.4h }, [x0]
				define %struct.bfloat16x4x2_t @test_vld2_dup_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld2 = tail call { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2r.v4bf16.p0bf16(bfloat* %ptr)
				%vld2.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat> } %vld2, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x2_t undef, <4 x bfloat> %vld2.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x2_t %.fca.0.0.insert, <4 x bfloat> %vld2.fca.1.extract, 0, 1
				ret %struct.bfloat16x4x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld2r.v4bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld2q_dup_bf16
				; CHECK: ld2r { v0.8h, v1.8h }, [x0]
				define %struct.bfloat16x8x2_t @test_vld2q_dup_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld2 = tail call { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2r.v8bf16.p0bf16(bfloat* %ptr)
				%vld2.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat> } %vld2, 1
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x2_t undef, <8 x bfloat> %vld2.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x2_t %.fca.0.0.insert, <8 x bfloat> %vld2.fca.1.extract, 0, 1
				ret %struct.bfloat16x8x2_t %.fca.0.1.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld2r.v8bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld3_dup_bf16
				; CHECK: ld3r { v0.4h, v1.4h, v2.4h }, [x0]
				define %struct.bfloat16x4x3_t @test_vld3_dup_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld3 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3r.v4bf16.p0bf16(bfloat* %ptr)
				%vld3.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3, 0
				%vld3.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3, 1
				%vld3.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld3, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x3_t undef, <4 x bfloat> %vld3.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.0.insert, <4 x bfloat> %vld3.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.1.insert, <4 x bfloat> %vld3.fca.2.extract, 0, 2
				ret %struct.bfloat16x4x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld3r.v4bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld3q_dup_bf16
				; CHECK: ld3r { v0.8h, v1.8h, v2.8h }, [x0]
				define %struct.bfloat16x8x3_t @test_vld3q_dup_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld3 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3r.v8bf16.p0bf16(bfloat* %ptr)
				%vld3.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3, 0
				%vld3.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3, 1
				%vld3.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld3, 2
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x3_t undef, <8 x bfloat> %vld3.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.0.insert, <8 x bfloat> %vld3.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x3_t %.fca.0.1.insert, <8 x bfloat> %vld3.fca.2.extract, 0, 2
				ret %struct.bfloat16x8x3_t %.fca.0.2.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld3r.v8bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld4_dup_bf16
				; CHECK: ld4r { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
				define %struct.bfloat16x4x4_t @test_vld4_dup_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld4 = tail call { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4r.v4bf16.p0bf16(bfloat* %ptr)
				%vld4.fca.0.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 0
				%vld4.fca.1.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 1
				%vld4.fca.2.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 2
				%vld4.fca.3.extract = extractvalue { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } %vld4, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x4x4_t undef, <4 x bfloat> %vld4.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.0.insert, <4 x bfloat> %vld4.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.1.insert, <4 x bfloat> %vld4.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x4x4_t %.fca.0.2.insert, <4 x bfloat> %vld4.fca.3.extract, 0, 3
				ret %struct.bfloat16x4x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat> } @llvm.aarch64.neon.ld4r.v4bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vld4q_dup_bf16
				; CHECK: ld4r { v0.8h, v1.8h, v2.8h, v3.8h }, [x0]
				define %struct.bfloat16x8x4_t @test_vld4q_dup_bf16(bfloat* %ptr) local_unnamed_addr #2 {
				entry:
				%vld4 = tail call { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4r.v8bf16.p0bf16(bfloat* %ptr)
				%vld4.fca.0.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 0
				%vld4.fca.1.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 1
				%vld4.fca.2.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 2
				%vld4.fca.3.extract = extractvalue { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } %vld4, 3
				%.fca.0.0.insert = insertvalue %struct.bfloat16x8x4_t undef, <8 x bfloat> %vld4.fca.0.extract, 0, 0
				%.fca.0.1.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.0.insert, <8 x bfloat> %vld4.fca.1.extract, 0, 1
				%.fca.0.2.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.1.insert, <8 x bfloat> %vld4.fca.2.extract, 0, 2
				%.fca.0.3.insert = insertvalue %struct.bfloat16x8x4_t %.fca.0.2.insert, <8 x bfloat> %vld4.fca.3.extract, 0, 3
				ret %struct.bfloat16x8x4_t %.fca.0.3.insert
				}

				; Function Attrs: argmemonly nounwind readonly
				declare { <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat> } @llvm.aarch64.neon.ld4r.v8bf16.p0bf16(bfloat*) #3

				; CHECK-LABEL: test_vst1_bf16
				; CHECK: str d0, [x0]
				define void @test_vst1_bf16(bfloat* nocapture %ptr, <4 x bfloat> %val) local_unnamed_addr #4 {
				entry:
				%0 = bitcast bfloat* %ptr to <4 x bfloat>*
				store <4 x bfloat> %val, <4 x bfloat>* %0, align 8
				ret void
				}

				; CHECK-LABEL: test_vst1q_bf16
				; CHECK: str q0, [x0]
				define void @test_vst1q_bf16(bfloat* nocapture %ptr, <8 x bfloat> %val) local_unnamed_addr #5 {
				entry:
				%0 = bitcast bfloat* %ptr to <8 x bfloat>*
				store <8 x bfloat> %val, <8 x bfloat>* %0, align 16
				ret void
				}

				; CHECK-LABEL: test_vst1_lane_bf16
				; CHECK: st1 { v0.h }[1], [x0]
				define void @test_vst1_lane_bf16(bfloat* nocapture %ptr, <4 x bfloat> %val) local_unnamed_addr #4 {
				entry:
				%0 = extractelement <4 x bfloat> %val, i32 1
				store bfloat %0, bfloat* %ptr, align 2
				ret void
				}

				; CHECK-LABEL: test_vst1q_lane_bf16
				; CHECK: st1 { v0.h }[7], [x0]
				define void @test_vst1q_lane_bf16(bfloat* nocapture %ptr, <8 x bfloat> %val) local_unnamed_addr #5 {
				entry:
				%0 = extractelement <8 x bfloat> %val, i32 7
				store bfloat %0, bfloat* %ptr, align 2
				ret void
				}

				; CHECK-LABEL: test_vst1_bf16_x2
				; CHECK: st1 { v0.4h, v1.4h }, [x0]
				define void @test_vst1_bf16_x2(bfloat* nocapture %ptr, [2 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [2 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [2 x <4 x bfloat>] %val.coerce, 1
				tail call void @llvm.aarch64.neon.st1x2.v4bf16.p0bf16(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, bfloat* %ptr)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st1x2.v4bf16.p0bf16(<4 x bfloat>, <4 x bfloat>, bfloat* nocapture) #7

				; CHECK-LABEL: test_vst1q_bf16_x2
				; CHECK: st1 { v0.8h, v1.8h }, [x0]
				define void @test_vst1q_bf16_x2(bfloat* nocapture %ptr, [2 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [2 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [2 x <8 x bfloat>] %val.coerce, 1
				tail call void @llvm.aarch64.neon.st1x2.v8bf16.p0bf16(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, bfloat* %ptr)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st1x2.v8bf16.p0bf16(<8 x bfloat>, <8 x bfloat>, bfloat* nocapture) #7

				; CHECK-LABEL: test_vst1_bf16_x3
				; CHECK: st1 { v0.4h, v1.4h, v2.4h }, [x0]
				define void @test_vst1_bf16_x3(bfloat* nocapture %ptr, [3 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 2
				tail call void @llvm.aarch64.neon.st1x3.v4bf16.p0bf16(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, bfloat* %ptr)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st1x3.v4bf16.p0bf16(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, bfloat* nocapture) #7

				; CHECK-LABEL: test_vst1q_bf16_x3
				; CHECK: st1 { v0.8h, v1.8h, v2.8h }, [x0]
				define void @test_vst1q_bf16_x3(bfloat* nocapture %ptr, [3 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 2
				tail call void @llvm.aarch64.neon.st1x3.v8bf16.p0bf16(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, bfloat* %ptr)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st1x3.v8bf16.p0bf16(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, bfloat* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst1_bf16_x4
				; CHECK: st1 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
				define void @test_vst1_bf16_x4(bfloat* nocapture %ptr, [4 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 2
				%val.coerce.fca.3.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 3
				tail call void @llvm.aarch64.neon.st1x4.v4bf16.p0bf16(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, <4 x bfloat> %val.coerce.fca.3.extract, bfloat* %ptr)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st1x4.v4bf16.p0bf16(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, bfloat* nocapture) #7

				; CHECK-LABEL: test_vst1q_bf16_x4
				; CHECK: st1 { v0.8h, v1.8h, v2.8h, v3.8h }, [x0]
				define void @test_vst1q_bf16_x4(bfloat* nocapture %ptr, [4 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 2
				%val.coerce.fca.3.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 3
				tail call void @llvm.aarch64.neon.st1x4.v8bf16.p0bf16(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, <8 x bfloat> %val.coerce.fca.3.extract, bfloat* %ptr)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st1x4.v8bf16.p0bf16(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, bfloat* nocapture) #7

				; CHECK-LABEL: test_vst2_bf16
				; CHECK: st2 { v0.4h, v1.4h }, [x0]
				define void @test_vst2_bf16(bfloat* nocapture %ptr, [2 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [2 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [2 x <4 x bfloat>] %val.coerce, 1
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st2.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st2.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, i8* nocapture) #7

				; CHECK-LABEL: test_vst2q_bf16
				; CHECK: st2 { v0.8h, v1.8h }, [x0]
				define void @test_vst2q_bf16(bfloat* nocapture %ptr, [2 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [2 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [2 x <8 x bfloat>] %val.coerce, 1
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st2.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st2.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, i8* nocapture) #7

				; CHECK-LABEL: test_vst2_lane_bf16
				; CHECK: st2 { v0.h, v1.h }[1], [x0]
				define void @test_vst2_lane_bf16(bfloat* nocapture %ptr, [2 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [2 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [2 x <4 x bfloat>] %val.coerce, 1
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st2lane.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, i64 1, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st2lane.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, i64, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst2q_lane_bf16
				; CHECK: st2 { v0.h, v1.h }[7], [x0]
				define void @test_vst2q_lane_bf16(bfloat* nocapture %ptr, [2 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [2 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [2 x <8 x bfloat>] %val.coerce, 1
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st2lane.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, i64 7, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st2lane.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, i64, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst3_bf16
				; CHECK: st3 { v0.4h, v1.4h, v2.4h }, [x0]
				define void @test_vst3_bf16(bfloat* nocapture %ptr, [3 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 2
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st3.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st3.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst3q_bf16
				; CHECK: st3 { v0.8h, v1.8h, v2.8h }, [x0]
				define void @test_vst3q_bf16(bfloat* nocapture %ptr, [3 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 2
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st3.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st3.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst3_lane_bf16
				; CHECK: st3 { v0.h, v1.h, v2.h }[1], [x0]
				define void @test_vst3_lane_bf16(bfloat* nocapture %ptr, [3 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [3 x <4 x bfloat>] %val.coerce, 2
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st3lane.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, i64 1, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st3lane.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, i64, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst3q_lane_bf16
				; CHECK: st3 { v0.h, v1.h, v2.h }[7], [x0]
				define void @test_vst3q_lane_bf16(bfloat* nocapture %ptr, [3 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [3 x <8 x bfloat>] %val.coerce, 2
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st3lane.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, i64 7, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st3lane.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i64, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst4_bf16
				; CHECK: st4 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
				define void @test_vst4_bf16(bfloat* nocapture %ptr, [4 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 2
				%val.coerce.fca.3.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 3
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st4.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, <4 x bfloat> %val.coerce.fca.3.extract, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st4.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst4q_bf16
				; CHECK: st4 { v0.8h, v1.8h, v2.8h, v3.8h }, [x0]
				define void @test_vst4q_bf16(bfloat* nocapture %ptr, [4 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 2
				%val.coerce.fca.3.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 3
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st4.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, <8 x bfloat> %val.coerce.fca.3.extract, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st4.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst4_lane_bf16
				; CHECK: st4 { v0.h, v1.h, v2.h, v3.h }[1], [x0]
				define void @test_vst4_lane_bf16(bfloat* nocapture %ptr, [4 x <4 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 2
				%val.coerce.fca.3.extract = extractvalue [4 x <4 x bfloat>] %val.coerce, 3
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st4lane.v4bf16.p0i8(<4 x bfloat> %val.coerce.fca.0.extract, <4 x bfloat> %val.coerce.fca.1.extract, <4 x bfloat> %val.coerce.fca.2.extract, <4 x bfloat> %val.coerce.fca.3.extract, i64 1, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st4lane.v4bf16.p0i8(<4 x bfloat>, <4 x bfloat>, <4 x bfloat>, <4 x bfloat>, i64, i8* nocapture) #7

				; Function Attrs: nounwind
				; CHECK-LABEL: test_vst4q_lane_bf16
				; CHECK: st4 { v0.h, v1.h, v2.h, v3.h }[7], [x0]
				define void @test_vst4q_lane_bf16(bfloat* nocapture %ptr, [4 x <8 x bfloat>] %val.coerce) local_unnamed_addr #6 {
				entry:
				%val.coerce.fca.0.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 0
				%val.coerce.fca.1.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 1
				%val.coerce.fca.2.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 2
				%val.coerce.fca.3.extract = extractvalue [4 x <8 x bfloat>] %val.coerce, 3
				%0 = bitcast bfloat* %ptr to i8*
				tail call void @llvm.aarch64.neon.st4lane.v8bf16.p0i8(<8 x bfloat> %val.coerce.fca.0.extract, <8 x bfloat> %val.coerce.fca.1.extract, <8 x bfloat> %val.coerce.fca.2.extract, <8 x bfloat> %val.coerce.fca.3.extract, i64 7, i8* %0)
				ret void
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.aarch64.neon.st4lane.v8bf16.p0i8(<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i64, i8* nocapture) #7

				attributes #0 = { norecurse nounwind readonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="64" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+bf16,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { norecurse nounwind readonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="128" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+bf16,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nounwind readonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+bf16,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #3 = { argmemonly nounwind readonly }
				attributes #4 = { nofree norecurse nounwind writeonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="64" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+bf16,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #5 = { nofree norecurse nounwind writeonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="128" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+bf16,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #6 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+bf16,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #7 = { argmemonly nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 11.0.0 (https://git.research.arm.com/corstu01/llvm-project.git bbc7a9e9d4ef536605fc70136adfe9d2b5809c4e)"}
				stuijUnsubmitted Not Done Reply Inline Actions You should be able to do without all these big blocks of attributes which I guess were generated from C -> IR conversion. Just remove it and the `#x`s after the function declarations (maybe replace them with `nounwind`). stuij: You should be able to do without all these big blocks of attributes which I guess were…

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64]: BFloat Load/Store Intrinsics&CodeGenClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267908

clang/include/clang/Basic/arm_neon.td

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c

clang/test/Sema/aarch64-bf16-ldst-intrinsics.c

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll

[AArch64]: BFloat Load/Store Intrinsics&CodeGen
ClosedPublic