This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
2/5
TargetLowering.cpp
-
Target/AArch64/
-
AArch64/
4
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-split-load.ll
-
sve-split-store.ll

Differential D83137

[SVE][CodeGen] Legalisation of masked loads and stores
ClosedPublic

Authored by kmclaughlin on Jul 3 2020, 10:05 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
david-arm

Commits

rG2762da0a16a7: [SVE][CodeGen] Legalisation of masked loads and stores

Summary

This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.

Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Jul 3 2020, 10:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 3 2020, 10:05 AM

Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. · View Herald Transcript

Harbormaster failed remote builds in B62861: Diff 275424!Jul 3 2020, 10:13 AM

The patch overall looks good to me - just a question about the assert!

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7156	Do we know if this is something we catch earlier and hence should never get here? I just wonder if here it's not really an assert that something went wrong with the code, but perhaps we just hit a case we don't support yet? If it's just because we don't support it yet, instead of asserting we could do: if (DataVT.isScalableVector()) report_fatal_error("Cannot currently handle compressed memory with scalable vectors");

sdesmalen added inline comments.Jul 7 2020, 1:59 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7177	Given that the type of VScale will be `AddrVT`, it's clearer to use `AddrVT.getSizeInBits().getFixedSize()`, that avoids the types being different.
7178	Should this be using `DataVT.getStoreSize()` instead of `DataVT.getSizeInBits()` ?

Changes to IncrementMemoryAddress:

Changed the assert added for scalable vectors to a report_fatal_error
Replaced Addr.getValueSizeInBits().getFixedSize() with AddrVT.getSizeInBits().getFixedSize()
Use DataVT.getStoreSize() instead of DataVT.getSizeInBits()

kmclaughlin marked 3 inline comments as done.Jul 7 2020, 11:10 AM

kmclaughlin added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7156	I think this is something that we just don't support yet, so I've changed this to `report_fatal_error` as suggested

efriedma added inline comments.Jul 7 2020, 11:53 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7156	This is part of the support for llvm.masked.expandload/llvm.masked.compressstore. There isn't a native instruction for that in SVE, but it's still a reasonable operation with scalable vectors.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1112	Do we need to support extracting, for example, an nxv2i1 from an nxv16i1?

LGTM!

This revision is now accepted and ready to land.Jul 10 2020, 12:25 AM

kmclaughlin added inline comments.Jul 10 2020, 9:42 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1112	We may need to support extracting a nxv2i1 from an nxv16i1, etc at some point, though I don't believe there are any code paths which would require this just now? At least, for the purposes of this patch I think we just need those patterns where the index is either 0 or half the number of elements.

LGTM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1112	We do have a DAGCombine for EXTRACT_SUBVECTOR of an EXTRACT_SUBVECTOR; it isn't triggering here for some reason? I guess that's okay.

Thanks for reviewing this patch, @efriedma & @david-arm

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1112	I think the reason that DAGCombine doesn't trigger here is because it checks to make sure that the operand of the extract has only one use, which isn't the case in this patch as the original predicate will have multiple uses.

Closed by commit rG2762da0a16a7: [SVE][CodeGen] Legalisation of masked loads and stores (authored by kmclaughlin). · Explain WhyJul 16 2020, 2:57 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

7 lines

Target/

AArch64/

AArch64SVEInstrInfo.td

14 lines

test/

CodeGen/

AArch64/

sve-split-load.ll

85 lines

sve-split-store.ll

81 lines

Diff 278413

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,147 Lines • ▼ Show 20 Lines	TargetLowering::IncrementMemoryAddress(SDValue Addr, SDValue Mask,
SelectionDAG &DAG,		SelectionDAG &DAG,
bool IsCompressedMemory) const {		bool IsCompressedMemory) const {
SDValue Increment;		SDValue Increment;
EVT AddrVT = Addr.getValueType();		EVT AddrVT = Addr.getValueType();
EVT MaskVT = Mask.getValueType();		EVT MaskVT = Mask.getValueType();
assert(DataVT.getVectorNumElements() == MaskVT.getVectorNumElements() &&		assert(DataVT.getVectorNumElements() == MaskVT.getVectorNumElements() &&
"Incompatible types of Data and Mask");		"Incompatible types of Data and Mask");
if (IsCompressedMemory) {		if (IsCompressedMemory) {
		if (DataVT.isScalableVector())
		david-armUnsubmitted Not Done Reply Inline Actions Do we know if this is something we catch earlier and hence should never get here? I just wonder if here it's not really an assert that something went wrong with the code, but perhaps we just hit a case we don't support yet? If it's just because we don't support it yet, instead of asserting we could do: if (DataVT.isScalableVector()) report_fatal_error("Cannot currently handle compressed memory with scalable vectors"); david-arm: Do we know if this is something we catch earlier and hence should never get here? I just wonder…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions I think this is something that we just don't support yet, so I've changed this to `report_fatal_error` as suggested kmclaughlin: I think this is something that we just don't support yet, so I've changed this to…
		efriedmaUnsubmitted Not Done Reply Inline Actions This is part of the support for llvm.masked.expandload/llvm.masked.compressstore. There isn't a native instruction for that in SVE, but it's still a reasonable operation with scalable vectors. efriedma: This is part of the support for llvm.masked.expandload/llvm.masked.compressstore. There isn't…
		report_fatal_error(
		"Cannot currently handle compressed memory with scalable vectors");
// Incrementing the pointer according to number of '1's in the mask.		// Incrementing the pointer according to number of '1's in the mask.
EVT MaskIntVT = EVT::getIntegerVT(*DAG.getContext(), MaskVT.getSizeInBits());		EVT MaskIntVT = EVT::getIntegerVT(*DAG.getContext(), MaskVT.getSizeInBits());
SDValue MaskInIntReg = DAG.getBitcast(MaskIntVT, Mask);		SDValue MaskInIntReg = DAG.getBitcast(MaskIntVT, Mask);
if (MaskIntVT.getSizeInBits() < 32) {		if (MaskIntVT.getSizeInBits() < 32) {
MaskInIntReg = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, MaskInIntReg);		MaskInIntReg = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, MaskInIntReg);
MaskIntVT = MVT::i32;		MaskIntVT = MVT::i32;
}		}

// Count '1's with POPCNT.		// Count '1's with POPCNT.
Increment = DAG.getNode(ISD::CTPOP, DL, MaskIntVT, MaskInIntReg);		Increment = DAG.getNode(ISD::CTPOP, DL, MaskIntVT, MaskInIntReg);
Increment = DAG.getZExtOrTrunc(Increment, DL, AddrVT);		Increment = DAG.getZExtOrTrunc(Increment, DL, AddrVT);
// Scale is an element size in bytes.		// Scale is an element size in bytes.
SDValue Scale = DAG.getConstant(DataVT.getScalarSizeInBits() / 8, DL,		SDValue Scale = DAG.getConstant(DataVT.getScalarSizeInBits() / 8, DL,
AddrVT);		AddrVT);
Increment = DAG.getNode(ISD::MUL, DL, AddrVT, Increment, Scale);		Increment = DAG.getNode(ISD::MUL, DL, AddrVT, Increment, Scale);
		} else if (DataVT.isScalableVector()) {
		Increment = DAG.getVScale(DL, AddrVT,
		APInt(AddrVT.getSizeInBits().getFixedSize(),
		DataVT.getStoreSize().getKnownMinSize()));
		sdesmalenUnsubmitted Done Reply Inline Actions Given that the type of VScale will be `AddrVT`, it's clearer to use `AddrVT.getSizeInBits().getFixedSize()`, that avoids the types being different. sdesmalen: Given that the type of VScale will be `AddrVT`, it's clearer to use `AddrVT.getSizeInBits().
} else		} else
		sdesmalenUnsubmitted Done Reply Inline Actions Should this be using `DataVT.getStoreSize()` instead of `DataVT.getSizeInBits()` ? sdesmalen: Should this be using `DataVT.getStoreSize()` instead of `DataVT.getSizeInBits()` ?
Increment = DAG.getConstant(DataVT.getStoreSize(), DL, AddrVT);		Increment = DAG.getConstant(DataVT.getStoreSize(), DL, AddrVT);

return DAG.getNode(ISD::ADD, DL, AddrVT, Addr, Increment);		return DAG.getNode(ISD::ADD, DL, AddrVT, Addr, Increment);
}		}

static SDValue clampDynamicVectorIndex(SelectionDAG &DAG,		static SDValue clampDynamicVectorIndex(SelectionDAG &DAG,
SDValue Idx,		SDValue Idx,
EVT VecVT,		EVT VecVT,
▲ Show 20 Lines • Show All 706 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,103 Lines • ▼ Show 20 Lines	multiclass sve_prefetch<SDPatternOperator prefetch, ValueType PredTy, Instruction RegImmInst, Instruction RegRegInst, int scale, ComplexPattern AddrCP> {

defm ZIP1_PPP : sve_int_perm_bin_perm_pp<0b000, "zip1", AArch64zip1>;		defm ZIP1_PPP : sve_int_perm_bin_perm_pp<0b000, "zip1", AArch64zip1>;
defm ZIP2_PPP : sve_int_perm_bin_perm_pp<0b001, "zip2", AArch64zip2>;		defm ZIP2_PPP : sve_int_perm_bin_perm_pp<0b001, "zip2", AArch64zip2>;
defm UZP1_PPP : sve_int_perm_bin_perm_pp<0b010, "uzp1", AArch64uzp1>;		defm UZP1_PPP : sve_int_perm_bin_perm_pp<0b010, "uzp1", AArch64uzp1>;
defm UZP2_PPP : sve_int_perm_bin_perm_pp<0b011, "uzp2", AArch64uzp2>;		defm UZP2_PPP : sve_int_perm_bin_perm_pp<0b011, "uzp2", AArch64uzp2>;
defm TRN1_PPP : sve_int_perm_bin_perm_pp<0b100, "trn1", AArch64trn1>;		defm TRN1_PPP : sve_int_perm_bin_perm_pp<0b100, "trn1", AArch64trn1>;
defm TRN2_PPP : sve_int_perm_bin_perm_pp<0b101, "trn2", AArch64trn2>;		defm TRN2_PPP : sve_int_perm_bin_perm_pp<0b101, "trn2", AArch64trn2>;

		// Extract lo/hi halves of legal predicate types.
		efriedmaUnsubmitted Not Done Reply Inline Actions Do we need to support extracting, for example, an nxv2i1 from an nxv16i1? efriedma: Do we need to support extracting, for example, an nxv2i1 from an nxv16i1?
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions We may need to support extracting a nxv2i1 from an nxv16i1, etc at some point, though I don't believe there are any code paths which would require this just now? At least, for the purposes of this patch I think we just need those patterns where the index is either 0 or half the number of elements. kmclaughlin: We may need to support extracting a nxv2i1 from an nxv16i1, etc at some point, though I don't…
		efriedmaUnsubmitted Not Done Reply Inline Actions We do have a DAGCombine for EXTRACT_SUBVECTOR of an EXTRACT_SUBVECTOR; it isn't triggering here for some reason? I guess that's okay. efriedma: We do have a DAGCombine for EXTRACT_SUBVECTOR of an EXTRACT_SUBVECTOR; it isn't triggering here…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions I think the reason that DAGCombine doesn't trigger here is because it checks to make sure that the operand of the extract has only one use, which isn't the case in this patch as the original predicate will have multiple uses. kmclaughlin: I think the reason that DAGCombine doesn't trigger here is because it checks to make sure that…
		def : Pat<(nxv2i1 (extract_subvector (nxv4i1 PPR:$Ps), (i64 0))),
		(ZIP1_PPP_S PPR:$Ps, (PFALSE))>;
		def : Pat<(nxv2i1 (extract_subvector (nxv4i1 PPR:$Ps), (i64 2))),
		(ZIP2_PPP_S PPR:$Ps, (PFALSE))>;
		def : Pat<(nxv4i1 (extract_subvector (nxv8i1 PPR:$Ps), (i64 0))),
		(ZIP1_PPP_H PPR:$Ps, (PFALSE))>;
		def : Pat<(nxv4i1 (extract_subvector (nxv8i1 PPR:$Ps), (i64 4))),
		(ZIP2_PPP_H PPR:$Ps, (PFALSE))>;
		def : Pat<(nxv8i1 (extract_subvector (nxv16i1 PPR:$Ps), (i64 0))),
		(ZIP1_PPP_B PPR:$Ps, (PFALSE))>;
		def : Pat<(nxv8i1 (extract_subvector (nxv16i1 PPR:$Ps), (i64 8))),
		(ZIP2_PPP_B PPR:$Ps, (PFALSE))>;

defm CMPHS_PPzZZ : sve_int_cmp_0<0b000, "cmphs", SETUGE, SETULE>;		defm CMPHS_PPzZZ : sve_int_cmp_0<0b000, "cmphs", SETUGE, SETULE>;
defm CMPHI_PPzZZ : sve_int_cmp_0<0b001, "cmphi", SETUGT, SETULT>;		defm CMPHI_PPzZZ : sve_int_cmp_0<0b001, "cmphi", SETUGT, SETULT>;
defm CMPGE_PPzZZ : sve_int_cmp_0<0b100, "cmpge", SETGE, SETLE>;		defm CMPGE_PPzZZ : sve_int_cmp_0<0b100, "cmpge", SETGE, SETLE>;
defm CMPGT_PPzZZ : sve_int_cmp_0<0b101, "cmpgt", SETGT, SETLT>;		defm CMPGT_PPzZZ : sve_int_cmp_0<0b101, "cmpgt", SETGT, SETLT>;
defm CMPEQ_PPzZZ : sve_int_cmp_0<0b110, "cmpeq", SETEQ, SETEQ>;		defm CMPEQ_PPzZZ : sve_int_cmp_0<0b110, "cmpeq", SETEQ, SETEQ>;
defm CMPNE_PPzZZ : sve_int_cmp_0<0b111, "cmpne", SETNE, SETNE>;		defm CMPNE_PPzZZ : sve_int_cmp_0<0b111, "cmpne", SETNE, SETNE>;

defm CMPEQ_WIDE_PPzZZ : sve_int_cmp_0_wide<0b010, "cmpeq", int_aarch64_sve_cmpeq_wide>;		defm CMPEQ_WIDE_PPzZZ : sve_int_cmp_0_wide<0b010, "cmpeq", int_aarch64_sve_cmpeq_wide>;
▲ Show 20 Lines • Show All 1,464 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-split-load.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

	; LOAD			; UNPREDICATED

	define <vscale x 4 x i16> @load_promote_4i8(<vscale x 4 x i16>* %a) {			define <vscale x 4 x i16> @load_promote_4i16(<vscale x 4 x i16>* %a) {
	; CHECK-LABEL: load_promote_4i8:			; CHECK-LABEL: load_promote_4i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]			; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a			%load = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
	ret <vscale x 4 x i16> %load			ret <vscale x 4 x i16> %load
	}			}

	Show All 32 Lines
	; CHECK-NEXT: ld1d { z4.d }, p0/z, [x0, #4, mul vl]			; CHECK-NEXT: ld1d { z4.d }, p0/z, [x0, #4, mul vl]
	; CHECK-NEXT: ld1d { z5.d }, p0/z, [x0, #5, mul vl]			; CHECK-NEXT: ld1d { z5.d }, p0/z, [x0, #5, mul vl]
	; CHECK-NEXT: ld1d { z6.d }, p0/z, [x0, #6, mul vl]			; CHECK-NEXT: ld1d { z6.d }, p0/z, [x0, #6, mul vl]
	; CHECK-NEXT: ld1d { z7.d }, p0/z, [x0, #7, mul vl]			; CHECK-NEXT: ld1d { z7.d }, p0/z, [x0, #7, mul vl]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = load <vscale x 16 x i64>, <vscale x 16 x i64>* %a			%load = load <vscale x 16 x i64>, <vscale x 16 x i64>* %a
	ret <vscale x 16 x i64> %load			ret <vscale x 16 x i64> %load
	}			}

				; MASKED

				define <vscale x 2 x i32> @masked_load_promote_2i32(<vscale x 2 x i32> *%a, <vscale x 2 x i1> %pg) {
				; CHECK-LABEL: masked_load_promote_2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1sw { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ret
				%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32> *%a, i32 1, <vscale x 2 x i1> %pg, <vscale x 2 x i32> undef)
				ret <vscale x 2 x i32> %load
				}

				define <vscale x 32 x i8> @masked_load_split_32i8(<vscale x 32 x i8> *%a, <vscale x 32 x i1> %pg) {
				; CHECK-LABEL: masked_load_split_32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: ld1b { z1.b }, p1/z, [x0, #1, mul vl]
				; CHECK-NEXT: ret
				%load = call <vscale x 32 x i8> @llvm.masked.load.nxv32i8(<vscale x 32 x i8> *%a, i32 1, <vscale x 32 x i1> %pg, <vscale x 32 x i8> undef)
				ret <vscale x 32 x i8> %load
				}

				define <vscale x 32 x i16> @masked_load_split_32i16(<vscale x 32 x i16> *%a, <vscale x 32 x i1> %pg) {
				; CHECK-LABEL: masked_load_split_32i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p2.b
				; CHECK-NEXT: zip1 p3.b, p0.b, p2.b
				; CHECK-NEXT: zip2 p0.b, p0.b, p2.b
				; CHECK-NEXT: ld1h { z0.h }, p3/z, [x0]
				; CHECK-NEXT: zip1 p3.b, p1.b, p2.b
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: zip2 p0.b, p1.b, p2.b
				; CHECK-NEXT: ld1h { z2.h }, p3/z, [x0, #2, mul vl]
				; CHECK-NEXT: ld1h { z3.h }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%load = call <vscale x 32 x i16> @llvm.masked.load.nxv32i16(<vscale x 32 x i16> *%a, i32 1, <vscale x 32 x i1> %pg, <vscale x 32 x i16> undef)
				ret <vscale x 32 x i16> %load
				}

				define <vscale x 8 x i32> @masked_load_split_8i32(<vscale x 8 x i32> *%a, <vscale x 8 x i1> %pg) {
				; CHECK-LABEL: masked_load_split_8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip1 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip2 p0.h, p0.h, p1.h
				; CHECK-NEXT: ld1w { z0.s }, p2/z, [x0]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i32> @llvm.masked.load.nxv8i32(<vscale x 8 x i32> *%a, i32 1, <vscale x 8 x i1> %pg, <vscale x 8 x i32> undef)
				ret <vscale x 8 x i32> %load
				}

				define <vscale x 8 x i64> @masked_load_split_8i64(<vscale x 8 x i64> *%a, <vscale x 8 x i1> %pg) {
				; CHECK-LABEL: masked_load_split_8i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip1 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip2 p0.h, p0.h, p1.h
				; CHECK-NEXT: zip1 p3.s, p2.s, p1.s
				; CHECK-NEXT: zip2 p2.s, p2.s, p1.s
				; CHECK-NEXT: ld1d { z0.d }, p3/z, [x0]
				; CHECK-NEXT: ld1d { z1.d }, p2/z, [x0, #1, mul vl]
				; CHECK-NEXT: zip1 p2.s, p0.s, p1.s
				; CHECK-NEXT: zip2 p0.s, p0.s, p1.s
				; CHECK-NEXT: ld1d { z2.d }, p2/z, [x0, #2, mul vl]
				; CHECK-NEXT: ld1d { z3.d }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%load = call <vscale x 8 x i64> @llvm.masked.load.nxv8i64(<vscale x 8 x i64> *%a, i32 1, <vscale x 8 x i1> %pg, <vscale x 8 x i64> undef)
				ret <vscale x 8 x i64> %load
				}

				declare <vscale x 32 x i8> @llvm.masked.load.nxv32i8(<vscale x 32 x i8>*, i32, <vscale x 32 x i1>, <vscale x 32 x i8>)

				declare <vscale x 32 x i16> @llvm.masked.load.nxv32i16(<vscale x 32 x i16>*, i32, <vscale x 32 x i1>, <vscale x 32 x i16>)

				declare <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>*, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)
				declare <vscale x 8 x i32> @llvm.masked.load.nxv8i32(<vscale x 8 x i32>*, i32, <vscale x 8 x i1>, <vscale x 8 x i32>)

				declare <vscale x 8 x i64> @llvm.masked.load.nxv8i64(<vscale x 8 x i64>*, i32, <vscale x 8 x i1>, <vscale x 8 x i64>)

llvm/test/CodeGen/AArch64/sve-split-store.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				; UNPREDICATED

	define void @store_promote_4i8(<vscale x 4 x i8> %data, <vscale x 4 x i8>* %a) {			define void @store_promote_4i8(<vscale x 4 x i8> %data, <vscale x 4 x i8>* %a) {
	; CHECK-LABEL: store_promote_4i8:			; CHECK-LABEL: store_promote_4i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: st1b { z0.s }, p0, [x0]			; CHECK-NEXT: st1b { z0.s }, p0, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	store <vscale x 4 x i8> %data, <vscale x 4 x i8>* %a			store <vscale x 4 x i8> %data, <vscale x 4 x i8>* %a
	ret void			ret void
	Show All 34 Lines
	; CHECK-NEXT: st1d { z3.d }, p0, [x0, #3, mul vl]			; CHECK-NEXT: st1d { z3.d }, p0, [x0, #3, mul vl]
	; CHECK-NEXT: st1d { z2.d }, p0, [x0, #2, mul vl]			; CHECK-NEXT: st1d { z2.d }, p0, [x0, #2, mul vl]
	; CHECK-NEXT: st1d { z1.d }, p0, [x0, #1, mul vl]			; CHECK-NEXT: st1d { z1.d }, p0, [x0, #1, mul vl]
	; CHECK-NEXT: st1d { z0.d }, p0, [x0]			; CHECK-NEXT: st1d { z0.d }, p0, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	store <vscale x 16 x i64> %data, <vscale x 16 x i64>* %a			store <vscale x 16 x i64> %data, <vscale x 16 x i64>* %a
	ret void			ret void
	}			}

				; MASKED

				define void @masked_store_promote_2i8(<vscale x 2 x i8> %data, <vscale x 2 x i8> *%a, <vscale x 2 x i1> %pg) {
				; CHECK-LABEL: masked_store_promote_2i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: st1b { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %data, <vscale x 2 x i8> *%a, i32 1, <vscale x 2 x i1> %pg)
				ret void
				}

				define void @masked_store_split_32i8(<vscale x 32 x i8> %data, <vscale x 32 x i8> *%a, <vscale x 32 x i1> %pg) {
				; CHECK-LABEL: masked_store_split_32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: st1b { z1.b }, p1, [x0, #1, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv32i8(<vscale x 32 x i8> %data, <vscale x 32 x i8> *%a, i32 1, <vscale x 32 x i1> %pg)
				ret void
				}

				define void @masked_store_split_32i16(<vscale x 32 x i16> %data, <vscale x 32 x i16> *%a, <vscale x 32 x i1> %pg) {
				; CHECK-LABEL: masked_store_split_32i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p2.b
				; CHECK-NEXT: zip2 p3.b, p1.b, p2.b
				; CHECK-NEXT: zip1 p1.b, p1.b, p2.b
				; CHECK-NEXT: st1h { z3.h }, p3, [x0, #3, mul vl]
				; CHECK-NEXT: zip2 p3.b, p0.b, p2.b
				; CHECK-NEXT: zip1 p0.b, p0.b, p2.b
				; CHECK-NEXT: st1h { z2.h }, p1, [x0, #2, mul vl]
				; CHECK-NEXT: st1h { z1.h }, p3, [x0, #1, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv32i16(<vscale x 32 x i16> %data, <vscale x 32 x i16> *%a, i32 1, <vscale x 32 x i1> %pg)
				ret void
				}

				define void @masked_store_split_8i32(<vscale x 8 x i32> %data, <vscale x 8 x i32> *%a, <vscale x 8 x i1> %pg) {
				; CHECK-LABEL: masked_store_split_8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip2 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip1 p0.h, p0.h, p1.h
				; CHECK-NEXT: st1w { z1.s }, p2, [x0, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv8i32(<vscale x 8 x i32> %data, <vscale x 8 x i32> *%a, i32 1, <vscale x 8 x i1> %pg)
				ret void
				}

				define void @masked_store_split_8i64(<vscale x 8 x i64> %data, <vscale x 8 x i64> *%a, <vscale x 8 x i1> %pg) {
				; CHECK-LABEL: masked_store_split_8i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip2 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip1 p0.h, p0.h, p1.h
				; CHECK-NEXT: zip2 p3.s, p2.s, p1.s
				; CHECK-NEXT: zip1 p2.s, p2.s, p1.s
				; CHECK-NEXT: st1d { z2.d }, p2, [x0, #2, mul vl]
				; CHECK-NEXT: zip2 p2.s, p0.s, p1.s
				; CHECK-NEXT: zip1 p0.s, p0.s, p1.s
				; CHECK-NEXT: st1d { z3.d }, p3, [x0, #3, mul vl]
				; CHECK-NEXT: st1d { z1.d }, p2, [x0, #1, mul vl]
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv8i64(<vscale x 8 x i64> %data, <vscale x 8 x i64> *%a, i32 1, <vscale x 8 x i1> %pg)
				ret void
				}

				declare void @llvm.masked.store.nxv2i8(<vscale x 2 x i8>, <vscale x 2 x i8>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv32i8(<vscale x 32 x i8>, <vscale x 32 x i8>*, i32, <vscale x 32 x i1>)

				declare void @llvm.masked.store.nxv32i16(<vscale x 32 x i16>, <vscale x 32 x i16>*, i32, <vscale x 32 x i1>)

				declare void @llvm.masked.store.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32>*, i32, <vscale x 8 x i1>)

				declare void @llvm.masked.store.nxv8i64(<vscale x 8 x i64>, <vscale x 8 x i64>*, i32, <vscale x 8 x i1>)