This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
5/6
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
4/4
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-intrinsics-ldst-ext.ll
-
sve-intrinsics-mask-ldst-ext.ll
-
sve-masked-ldst-zext.ll

Differential D120953

[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
ClosedPublic

Authored by Allen on Mar 3 2022, 6:19 PM.

Download Raw Diff

Details

Reviewers

ab
arsenm
sdesmalen
david-arm
paulwalker-arm

Commits

rG828b89bc0bb1: [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of…

Summary

Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Diff Detail

Unit TestsFailed

	Time	Test
	270 ms	x64 debian > LLVM.CodeGen/RISCV/rvv::extload-truncstore.ll
	360 ms	x64 debian > LLVM.CodeGen/RISCV/rvv::mscatter-sdnode.ll
	400 ms	x64 debian > LLVM.CodeGen/RISCV/rvv::vpscatter-sdnode.ll

Event Timeline

Allen created this revision.Mar 3 2022, 6:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:19 PM

Herald added subscribers: ecnelises, steven.zhang, hiraditya, kristof.beyls. · View Herald Transcript

Allen requested review of this revision.Mar 3 2022, 6:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:19 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B152501: Diff 412887.Mar 3 2022, 7:03 PM

dmgreen added reviewers: sdesmalen, david-arm, paulwalker-arm.Mar 3 2022, 11:53 PM

david-arm added inline comments.Mar 4 2022, 3:04 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1246	So this is really just an optimisation where you're trying to reduce the number of loads we perform, in favour of more unpklo/hi instructions. It seems to make sense and looks like an improvement, but this change is very specific to one set of types and one extension type. What about ISD::ZEXTLOAD and other extensions from legal types, i.e. %wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef) %res = sext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64> I think if we go down this route it's worth adding all other extension types from legal inputs too.

paulwalker-arm added inline comments.Mar 4 2022, 3:12 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1246	It seems we're currently marking all the scalable floating point types as Expand, so we should do the same for all the scalable integer types and then selectively enable the ones that make sense. Although I'd actually prefer the common default to be Expand rather than forcing all targets to do the initialisation, but I guess that's outside the scope of this patch.

update to support more type and zero extend

Allen retitled this revision from [AArch64][SelectionDAG] Prevent legality of extloads nxv4i64 from nxv4i32 to [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads.Mar 4 2022, 11:38 PM

Allen edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B152714: Diff 413200.Mar 4 2022, 11:58 PM

Allen updated this revision to Diff 413346.Mar 6 2022, 11:22 PM

Allen marked an inline comment as done.

Harbormaster completed remote builds in B152839: Diff 413346.Mar 7 2022, 12:35 AM

add test case in file sve-intrinsics-mask-loads.ll

Harbormaster completed remote builds in B152937: Diff 413486.Mar 7 2022, 9:02 AM

ping ?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1246	Thanks very much, and also update comment. 1、ISD::ZEXTLOAD added 2、types vscale x 16 x i8, vscale x 8 x i16 and vscale x 4 x i32 are considered, please let me know if more types need be added.
1246	Thanks very much. As not all of the scalable integer types will be benifit from the disable of extending loads, so I only choose part of them, does it right ? for example, now ld1h { z0.s }, p0/z, [x0] is already fine, and there is extra and z0.s, z0.s, #0xffff after disable the extending loads. %wide.masked.load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %a, i32 4, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef) %res = zext <vscale x 4 x i16> %wide.masked.load to <vscale x 4 x i32>

Hi @Allen, the codegen for the tests looks good to me and the optimisation seems sensible. However, I think perhaps what @paulwalker-arm meant was that we should set all expand types for all integer combinations to be Expand, then selectively mark some as Legal. Can you confirm this is what you meant @paulwalker-arm?

In D120953#3369440, @david-arm wrote:

Hi @Allen, the codegen for the tests looks good to me and the optimisation seems sensible. However, I think perhaps what @paulwalker-arm meant was that we should set all expand types for all integer combinations to be Expand, then selectively mark some as Legal. Can you confirm this is what you meant @paulwalker-arm?

ok, Thanks very much. I wait @paulwalker-arm to check that.

Sorry for the delay. Yes @david-arm that is what I meant. The current set of exclusions looks incomplete, for example, based on the intent of the patch I'd suggest nxv16i8 -> nxv16i32 also wants to be prevented. So it seems safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. The first part of this is currently done for floating point scalable vector types so I think we can just replace the fp_scalable_vector_valuetypes iterators it uses with scalable_vector_valuetypes and remove the floating point related comments.

With that all said I now have a bigger concern with this patch. Although the code quality for masked loads and stores is improved, the same is unlikely for the normal loads and store to which this test is also applied. I say this because for those cases no explicit unpacking is required (for the data or the predicate) but after this patch they'll started to be generated.

For example

ptrue	p0.d
ld1sw	{ z0.d }, p0/z, [x0]
ld1sw	{ z1.d }, p0/z, [x0, #1, mul vl]

will become

ptrue	p0.s
ld1w	{ z1.s }, p0/z, [x0]
sunpklo	z0.d, z1.s
sunpkhi	z1.d, z1.s

I've just tried this patch and there's an even bigger problem in that marking these operations as Expand triggers fixed length specific code in DAGCombine that results in a compiler assert/crash when passed

%wide.load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %base
%res = sext <vscale x 4 x i32> %wide.load to <vscale x 4 x i64>

This makes me think we first need to tighten up the handling of normal loads and stores to maintain the existing code quality and then apply the restrictions necessary for the masked varieties.

@paulwalker-arm

Thanks very much, I'm happy to try fix the above assert/crash firstly in another commit, and do you have some advice on how to figure out all those we directly support ?

By directly supported I mean those which are legal and thus have isel patterns. Which I think boils down to:

for (auto Op : {ISD::ZEXTLOAD, ISD::SEXTLOAD}) {
  setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i8, Legal);
  setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i16, Legal);
  setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i32, Legal);
  setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i8, Legal);
  setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i16, Legal);
  setLoadExtAction(Op, MVT::nxv8i16, MVT::nxv8i8, Legal);
}

I should also mention that downstream we had started work to use separate legalisation tables for each of LOAD, MLOAD and MGATHER but parked it until there was a real need. Perhaps this is that need and we should pick that work back up, but let's see how this work plays out first.

Fix the compile crash of normal loads

In D120953#3374916, @paulwalker-arm wrote:
By directly supported I mean those which are legal and thus have isel patterns. Which I think boils down to:

Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation.

If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.

Harbormaster completed remote builds in B153918: Diff 414841.Mar 12 2022, 8:38 AM

Matt added a subscriber: Matt.Mar 17 2022, 5:55 PM

paulwalker-arm added inline comments.Mar 18 2022, 5:39 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11497	Sorry for the delay but I've been ponding this on and off. I cannot shake the feeling that any "is vector" check here is artificial and so limiting it to only fixed length vectors also seems artificial. Personally I think `isVector()` should be removed and any negative effects on code generation being the result of that target's implementation of `isVectorLoadExtDesirable()` likely being at fault. That said, I understand this might be above and beyond the work you want to carry out so I guess you making the current restriction slightly less artificial is a nice step in the right direction.
11506	However, the `isVectorLoadExtDesirable()` question will always be pertinent regardless of vector type and so you shouldn't need this change.

Add a comment TODO as there are 2 cases will fail if I delete the conditon change of isFixedLengthVector derectly

LLVM :: CodeGen/AArch64/insert-subvector-res-legalization.ll
LLVM :: CodeGen/AArch64/sve-fixed-length-ext-loads.ll

Harbormaster completed remote builds in B155181: Diff 416675.Mar 19 2022, 3:47 AM

Thanks @Allen this works for me. Before accepting I just wanted to double check your previous comment:

In D120953#3377136, @Allen wrote:

Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation.

If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.

Is this something you can do for this patch? Or are you wanting to save that work for a follow on patch?

In D120953#3396109, @paulwalker-arm wrote:

Thanks @Allen this works for me. Before accepting I just wanted to double check your previous comment:

In D120953#3377136, @Allen wrote:

Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation.

If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.

Is this something you can do for this patch? Or are you wanting to save that work for a follow on patch?

hi, @paulwalker-arm:

I hope this patch can be accepted firstly. Then I'll start another patch to finish that refactor.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11497	@paulwalker-arm do you think it is ok to fix the crash of normal loads ?
11497	Thanks very much. If you agree, I can add a comment "TODO: isFixedLengthVector() should be removed and any negative effects on code generation being the result of that target's implementation of isVectorLoadExtDesirable()", and try to refactor that with a separate commit.
11506	Done, thanks

paulwalker-arm accepted this revision.Mar 21 2022, 8:38 AM

paulwalker-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11497	@Allen I'm happy with this change. It doesn't actually fix the bug that caused the crash but does maintain existing behaviour so that buggy code path is not hit (at least with the current set of tests). The bug itself is within `DAGCombiner::CombineExtLoad` whereby is has a `!DstVT.isScalableVector()` assert that appears after a call to `getVectorNumElements()` so the code will fail before ever hitting that assert. The real fix is to remove the assert and just add if (DstVT.isScalableVector()) return SDValue(); to the top of `DAGCombiner::CombineExtLoad`. As you've worked round the issue it's not strictly necessary but if you could add that fix to this patch then it'll stop anybody else from hitting it.

This revision is now accepted and ready to land.Mar 21 2022, 8:38 AM

This revision was landed with ongoing or failed builds.Mar 21 2022, 8:48 AM

Closed by commit rG828b89bc0bb1: [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of… (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen marked 2 inline comments as done.

Allen added a commit: rG828b89bc0bb1: [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of….

Allen mentioned this in D122281: [AArch64][SelectionDAG] Refactor to support more scalable vector extending loads.Mar 22 2022, 8:24 PM

Allen mentioned this in rGc3fe025bd4a1: [AArch64][SelectionDAG] Refactor to support more scalable vector extending loads.Mar 27 2022, 6:20 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.cpp

7 lines

test/

CodeGen/

AArch64/

sve-intrinsics-ldst-ext.ll

102 lines

sve-intrinsics-mask-ldst-ext.ll

123 lines

sve-masked-ldst-zext.ll

19 lines

Diff 414841

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 11,488 Lines • ▼ Show 20 Lines
	// deemed desirable by the target.			// deemed desirable by the target.
	static SDValue tryToFoldExtOfLoad(SelectionDAG &DAG, DAGCombiner &Combiner,			static SDValue tryToFoldExtOfLoad(SelectionDAG &DAG, DAGCombiner &Combiner,
	const TargetLowering &TLI, EVT VT,			const TargetLowering &TLI, EVT VT,
	bool LegalOperations, SDNode *N, SDValue N0,			bool LegalOperations, SDNode *N, SDValue N0,
	ISD::LoadExtType ExtLoadType,			ISD::LoadExtType ExtLoadType,
	ISD::NodeType ExtOpc) {			ISD::NodeType ExtOpc) {
	if (!ISD::isNON_EXTLoad(N0.getNode()) \|\|			if (!ISD::isNON_EXTLoad(N0.getNode()) \|\|
	!ISD::isUNINDEXEDLoad(N0.getNode()) \|\|			!ISD::isUNINDEXEDLoad(N0.getNode()) \|\|
	((LegalOperations \|\| VT.isVector() \|\|			((LegalOperations \|\| VT.isFixedLengthVector() \|\|
				AllenAuthorUnsubmitted Done Reply Inline Actions @paulwalker-arm do you think it is ok to fix the crash of normal loads ? Allen: @paulwalker-arm do you think it is ok to fix the crash of normal loads ?
				paulwalker-armUnsubmitted Not Done Reply Inline Actions @Allen I'm happy with this change. It doesn't actually fix the bug that caused the crash but does maintain existing behaviour so that buggy code path is not hit (at least with the current set of tests). The bug itself is within `DAGCombiner::CombineExtLoad` whereby is has a `!DstVT.isScalableVector()` assert that appears after a call to `getVectorNumElements()` so the code will fail before ever hitting that assert. The real fix is to remove the assert and just add if (DstVT.isScalableVector()) return SDValue(); to the top of `DAGCombiner::CombineExtLoad`. As you've worked round the issue it's not strictly necessary but if you could add that fix to this patch then it'll stop anybody else from hitting it. paulwalker-arm: @Allen I'm happy with this change. It doesn't actually fix the bug that caused the crash but…
				paulwalker-armUnsubmitted Done Reply Inline Actions Sorry for the delay but I've been ponding this on and off. I cannot shake the feeling that any "is vector" check here is artificial and so limiting it to only fixed length vectors also seems artificial. Personally I think `isVector()` should be removed and any negative effects on code generation being the result of that target's implementation of `isVectorLoadExtDesirable()` likely being at fault. That said, I understand this might be above and beyond the work you want to carry out so I guess you making the current restriction slightly less artificial is a nice step in the right direction. paulwalker-arm: Sorry for the delay but I've been ponding this on and off. I cannot shake the feeling that any…
				AllenAuthorUnsubmitted Done Reply Inline Actions Thanks very much. If you agree, I can add a comment "TODO: isFixedLengthVector() should be removed and any negative effects on code generation being the result of that target's implementation of isVectorLoadExtDesirable()", and try to refactor that with a separate commit. Allen: Thanks very much. If you agree, I can add a comment "TODO: isFixedLengthVector() should be…
	!cast<LoadSDNode>(N0)->isSimple()) &&			!cast<LoadSDNode>(N0)->isSimple()) &&
	!TLI.isLoadExtLegal(ExtLoadType, VT, N0.getValueType())))			!TLI.isLoadExtLegal(ExtLoadType, VT, N0.getValueType())))
	return {};			return {};

	bool DoXform = true;			bool DoXform = true;
	SmallVector<SDNode *, 4> SetCCs;			SmallVector<SDNode *, 4> SetCCs;
	if (!N0.hasOneUse())			if (!N0.hasOneUse())
	DoXform = ExtendUsesToFormExtLoad(VT, N, N0, ExtOpc, SetCCs, TLI);			DoXform = ExtendUsesToFormExtLoad(VT, N, N0, ExtOpc, SetCCs, TLI);
	if (VT.isVector())			if (VT.isFixedLengthVector())
				paulwalker-armUnsubmitted Done Reply Inline Actions However, the `isVectorLoadExtDesirable()` question will always be pertinent regardless of vector type and so you shouldn't need this change. paulwalker-arm: However, the `isVectorLoadExtDesirable()` question will always be pertinent regardless of…
				AllenAuthorUnsubmitted Done Reply Inline Actions Done, thanks Allen: Done, thanks
	DoXform &= TLI.isVectorLoadExtDesirable(SDValue(N, 0));			DoXform &= TLI.isVectorLoadExtDesirable(SDValue(N, 0));
	if (!DoXform)			if (!DoXform)
	return {};			return {};

	LoadSDNode *LN0 = cast<LoadSDNode>(N0);			LoadSDNode *LN0 = cast<LoadSDNode>(N0);
	SDValue ExtLoad = DAG.getExtLoad(ExtLoadType, SDLoc(LN0), VT, LN0->getChain(),			SDValue ExtLoad = DAG.getExtLoad(ExtLoadType, SDLoc(LN0), VT, LN0->getChain(),
	LN0->getBasePtr(), N0.getValueType(),			LN0->getBasePtr(), N0.getValueType(),
	LN0->getMemOperand());			LN0->getMemOperand());
	▲ Show 20 Lines • Show All 12,937 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,223 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
setTruncStoreAction(VT, InnerVT, Expand);		setTruncStoreAction(VT, InnerVT, Expand);
// SVE does not have floating-point extending loads.		// SVE does not have floating-point extending loads.
setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);
}		}
}		}

		// SVE supports unpklo/hi instructions to reduce the number of loads.
		for (auto Op : {ISD::SEXTLOAD, ISD::ZEXTLOAD, ISD::EXTLOAD}) {
		setLoadExtAction(Op, MVT::nxv16i64, MVT::nxv16i8, Expand);
		setLoadExtAction(Op, MVT::nxv8i64, MVT::nxv8i16, Expand);
		setLoadExtAction(Op, MVT::nxv4i64, MVT::nxv4i32, Expand);
		}

// SVE supports truncating stores of 64 and 128-bit vectors		// SVE supports truncating stores of 64 and 128-bit vectors
setTruncStoreAction(MVT::v2i64, MVT::v2i8, Custom);		setTruncStoreAction(MVT::v2i64, MVT::v2i8, Custom);
setTruncStoreAction(MVT::v2i64, MVT::v2i16, Custom);		setTruncStoreAction(MVT::v2i64, MVT::v2i16, Custom);
setTruncStoreAction(MVT::v2i64, MVT::v2i32, Custom);		setTruncStoreAction(MVT::v2i64, MVT::v2i32, Custom);
setTruncStoreAction(MVT::v2i32, MVT::v2i8, Custom);		setTruncStoreAction(MVT::v2i32, MVT::v2i8, Custom);
setTruncStoreAction(MVT::v2i32, MVT::v2i16, Custom);		setTruncStoreAction(MVT::v2i32, MVT::v2i16, Custom);

for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,		for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,
		david-armUnsubmitted Done Reply Inline Actions So this is really just an optimisation where you're trying to reduce the number of loads we perform, in favour of more unpklo/hi instructions. It seems to make sense and looks like an improvement, but this change is very specific to one set of types and one extension type. What about ISD::ZEXTLOAD and other extensions from legal types, i.e. %wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef) %res = sext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64> I think if we go down this route it's worth adding all other extension types from legal inputs too. david-arm: So this is really just an optimisation where you're trying to reduce the number of loads we…
		paulwalker-armUnsubmitted Done Reply Inline Actions It seems we're currently marking all the scalable floating point types as Expand, so we should do the same for all the scalable integer types and then selectively enable the ones that make sense. Although I'd actually prefer the common default to be Expand rather than forcing all targets to do the initialisation, but I guess that's outside the scope of this patch. paulwalker-arm: It seems we're currently marking all the scalable floating point types as Expand, so we should…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks very much. As not all of the scalable integer types will be benifit from the disable of extending loads, so I only choose part of them, does it right ? for example, now ld1h { z0.s }, p0/z, [x0] is already fine, and there is extra and z0.s, z0.s, #0xffff after disable the extending loads. %wide.masked.load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %a, i32 4, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef) %res = zext <vscale x 4 x i16> %wide.masked.load to <vscale x 4 x i32> Allen: Thanks very much. As not all of the scalable integer types will be benifit from the disable of…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks very much, and also update comment. 1、ISD::ZEXTLOAD added 2、types vscale x 16 x i8, vscale x 8 x i16 and vscale x 4 x i32 are considered, please let me know if more types need be added. Allen: Thanks very much, and also update comment. 1、ISD::ZEXTLOAD added 2、types vscale x 16 x i8…
MVT::nxv4f32, MVT::nxv2f64}) {		MVT::nxv4f32, MVT::nxv2f64}) {
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
setOperationAction(ISD::MLOAD, VT, Custom);		setOperationAction(ISD::MLOAD, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
▲ Show 20 Lines • Show All 19,415 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-ldst-ext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -asm-verbose=1 < %s \| FileCheck %s

				;
				; LD1B
				;

				define <vscale x 16 x i64> @ld1b_i8_sext(<vscale x 16 x i8> *%base) {
				; CHECK-LABEL: ld1b_i8_sext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1sb { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1sb { z1.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ld1sb { z2.d }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ld1sb { z3.d }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ld1sb { z4.d }, p0/z, [x0, #4, mul vl]
				; CHECK-NEXT: ld1sb { z5.d }, p0/z, [x0, #5, mul vl]
				; CHECK-NEXT: ld1sb { z6.d }, p0/z, [x0, #6, mul vl]
				; CHECK-NEXT: ld1sb { z7.d }, p0/z, [x0, #7, mul vl]
				; CHECK-NEXT: ret
				%wide.load = load <vscale x 16 x i8>, <vscale x 16 x i8>* %base
				%res = sext <vscale x 16 x i8> %wide.load to <vscale x 16 x i64>
				ret <vscale x 16 x i64> %res
				}

				define <vscale x 16 x i64> @ld1b_i8_zext(<vscale x 16 x i8> *%base) {
				; CHECK-LABEL: ld1b_i8_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1b { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1b { z1.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ld1b { z2.d }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ld1b { z3.d }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ld1b { z4.d }, p0/z, [x0, #4, mul vl]
				; CHECK-NEXT: ld1b { z5.d }, p0/z, [x0, #5, mul vl]
				; CHECK-NEXT: ld1b { z6.d }, p0/z, [x0, #6, mul vl]
				; CHECK-NEXT: ld1b { z7.d }, p0/z, [x0, #7, mul vl]
				; CHECK-NEXT: ret
				%wide.load = load <vscale x 16 x i8>, <vscale x 16 x i8>* %base
				%res = zext <vscale x 16 x i8> %wide.load to <vscale x 16 x i64>
				ret <vscale x 16 x i64> %res
				}

				;
				; LD1H
				;

				define <vscale x 8 x i64> @ld1h_i16_sext(<vscale x 8 x i16> *%base) {
				; CHECK-LABEL: ld1h_i16_sext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1sh { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1sh { z1.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ld1sh { z2.d }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ld1sh { z3.d }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%wide.load = load <vscale x 8 x i16>, <vscale x 8 x i16>* %base
				%res = sext <vscale x 8 x i16> %wide.load to <vscale x 8 x i64>
				ret <vscale x 8 x i64> %res
				}

				define <vscale x 8 x i64> @ld1h_i16_zext(<vscale x 8 x i16> *%base) {
				; CHECK-LABEL: ld1h_i16_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1h { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ld1h { z2.d }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ld1h { z3.d }, p0/z, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%wide.load = load <vscale x 8 x i16>, <vscale x 8 x i16>* %base
				%res = zext <vscale x 8 x i16> %wide.load to <vscale x 8 x i64>
				ret <vscale x 8 x i64> %res
				}

				;
				; LD1W
				;

				define <vscale x 4 x i64> @ld1w_i32_sext(<vscale x 4 x i32> *%base) {
				; CHECK-LABEL: ld1w_i32_sext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1sw { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1sw { z1.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ret
				%wide.load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %base
				%res = sext <vscale x 4 x i32> %wide.load to <vscale x 4 x i64>
				ret <vscale x 4 x i64> %res
				}

				define <vscale x 4 x i64> @ld1w_i32_zext(<vscale x 4 x i32> *%base) {
				; CHECK-LABEL: ld1w_i32_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ld1w { z1.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ret
				%wide.load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %base
				%res = zext <vscale x 4 x i32> %wide.load to <vscale x 4 x i64>
				ret <vscale x 4 x i64> %res
				}

llvm/test/CodeGen/AArch64/sve-intrinsics-mask-ldst-ext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -asm-verbose=1 < %s \| FileCheck %s

				;
				; LD1B
				;

				define <vscale x 16 x i64> @masked_ld1b_i8_sext(<vscale x 16 x i8> *%base, <vscale x 16 x i1> %mask) {
				; CHECK-LABEL: masked_ld1b_i8_sext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: sunpklo z1.h, z0.b
				; CHECK-NEXT: sunpkhi z0.h, z0.b
				; CHECK-NEXT: sunpklo z2.s, z1.h
				; CHECK-NEXT: sunpkhi z3.s, z1.h
				; CHECK-NEXT: sunpklo z5.s, z0.h
				; CHECK-NEXT: sunpkhi z7.s, z0.h
				; CHECK-NEXT: sunpklo z0.d, z2.s
				; CHECK-NEXT: sunpkhi z1.d, z2.s
				; CHECK-NEXT: sunpklo z2.d, z3.s
				; CHECK-NEXT: sunpkhi z3.d, z3.s
				; CHECK-NEXT: sunpklo z4.d, z5.s
				; CHECK-NEXT: sunpkhi z5.d, z5.s
				; CHECK-NEXT: sunpklo z6.d, z7.s
				; CHECK-NEXT: sunpkhi z7.d, z7.s
				; CHECK-NEXT: ret
				%wide.masked.load = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>* %base, i32 2, <vscale x 16 x i1> %mask, <vscale x 16 x i8> undef)
				%res = sext <vscale x 16 x i8> %wide.masked.load to <vscale x 16 x i64>
				ret <vscale x 16 x i64> %res
				}

				define <vscale x 16 x i64> @masked_ld1b_i8_zext(<vscale x 16 x i8> *%base, <vscale x 16 x i1> %mask) {
				; CHECK-LABEL: masked_ld1b_i8_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: uunpklo z1.h, z0.b
				; CHECK-NEXT: uunpkhi z0.h, z0.b
				; CHECK-NEXT: uunpklo z2.s, z1.h
				; CHECK-NEXT: uunpkhi z3.s, z1.h
				; CHECK-NEXT: uunpklo z5.s, z0.h
				; CHECK-NEXT: uunpkhi z7.s, z0.h
				; CHECK-NEXT: uunpklo z0.d, z2.s
				; CHECK-NEXT: uunpkhi z1.d, z2.s
				; CHECK-NEXT: uunpklo z2.d, z3.s
				; CHECK-NEXT: uunpkhi z3.d, z3.s
				; CHECK-NEXT: uunpklo z4.d, z5.s
				; CHECK-NEXT: uunpkhi z5.d, z5.s
				; CHECK-NEXT: uunpklo z6.d, z7.s
				; CHECK-NEXT: uunpkhi z7.d, z7.s
				; CHECK-NEXT: ret
				%wide.masked.load = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>* %base, i32 2, <vscale x 16 x i1> %mask, <vscale x 16 x i8> undef)
				%res = zext <vscale x 16 x i8> %wide.masked.load to <vscale x 16 x i64>
				ret <vscale x 16 x i64> %res
				}

				;
				; LD1H
				;

				define <vscale x 8 x i64> @masked_ld1h_i16_sext(<vscale x 8 x i16> *%base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_ld1h_i16_sext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: sunpklo z1.s, z0.h
				; CHECK-NEXT: sunpkhi z3.s, z0.h
				; CHECK-NEXT: sunpklo z0.d, z1.s
				; CHECK-NEXT: sunpkhi z1.d, z1.s
				; CHECK-NEXT: sunpklo z2.d, z3.s
				; CHECK-NEXT: sunpkhi z3.d, z3.s
				; CHECK-NEXT: ret
				%wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
				%res = sext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64>
				ret <vscale x 8 x i64> %res
				}

				define <vscale x 8 x i64> @masked_ld1h_i16_zext(<vscale x 8 x i16> *%base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_ld1h_i16_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
				; CHECK-NEXT: uunpklo z1.s, z0.h
				; CHECK-NEXT: uunpkhi z3.s, z0.h
				; CHECK-NEXT: uunpklo z0.d, z1.s
				; CHECK-NEXT: uunpkhi z1.d, z1.s
				; CHECK-NEXT: uunpklo z2.d, z3.s
				; CHECK-NEXT: uunpkhi z3.d, z3.s
				; CHECK-NEXT: ret
				%wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
				%res = zext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64>
				ret <vscale x 8 x i64> %res
				}

				;
				; LD1W
				;

				define <vscale x 4 x i64> @masked_ld1w_i32_sext(<vscale x 4 x i32> *%base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_ld1w_i32_sext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0]
				; CHECK-NEXT: sunpklo z0.d, z1.s
				; CHECK-NEXT: sunpkhi z1.d, z1.s
				; CHECK-NEXT: ret
				%wide.masked.load = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32>* %base, i32 4, <vscale x 4 x i1> %mask, <vscale x 4 x i32> undef)
				%res = sext <vscale x 4 x i32> %wide.masked.load to <vscale x 4 x i64>
				ret <vscale x 4 x i64> %res
				}

				define <vscale x 4 x i64> @masked_ld1w_i32_zext(<vscale x 4 x i32> *%base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_ld1w_i32_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0]
				; CHECK-NEXT: uunpklo z0.d, z1.s
				; CHECK-NEXT: uunpkhi z1.d, z1.s
				; CHECK-NEXT: ret
				%wide.masked.load = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32>* %base, i32 4, <vscale x 4 x i1> %mask, <vscale x 4 x i32> undef)
				%res = zext <vscale x 4 x i32> %wide.masked.load to <vscale x 4 x i64>
				ret <vscale x 4 x i64> %res
				}

				declare <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>*, i32 immarg, <vscale x 16 x i1>, <vscale x 16 x i8>)
				declare <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>*, i32 immarg, <vscale x 8 x i1>, <vscale x 8 x i16>)
				declare <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32>*, i32 immarg, <vscale x 4 x i1>, <vscale x 4 x i32>)

llvm/test/CodeGen/AArch64/sve-masked-ldst-zext.ll

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %src, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> %passthru)		%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %src, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> %passthru)
%ext = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>		%ext = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
ret <vscale x 2 x i64> %ext		ret <vscale x 2 x i64> %ext
}		}

; Return type requires splitting		; Return type requires splitting
define <vscale x 8 x i64> @masked_zload_nxv8i16(<vscale x 8 x i16>* %a, <vscale x 8 x i1> %mask) {		define <vscale x 8 x i64> @masked_zload_nxv8i16(<vscale x 8 x i16>* %a, <vscale x 8 x i1> %mask) {
; CHECK-LABEL: masked_zload_nxv8i16:		; CHECK-LABEL: masked_zload_nxv8i16:
; CHECK: punpklo p1.h, p0.b		; CHECK: ld1h { z0.h }, p0/z, [x0]
; CHECK-NEXT: punpkhi p0.h, p0.b		; CHECK-NEXT: uunpklo z1.s, z0.h
; CHECK-NEXT: punpklo p2.h, p1.b		; CHECK-NEXT: uunpkhi z3.s, z0.h
; CHECK-NEXT: punpkhi p1.h, p1.b		; CHECK-NEXT: uunpklo z0.d, z1.s
; CHECK-NEXT: ld1h { z0.d }, p2/z, [x0]		; CHECK-NEXT: uunpkhi z1.d, z1.s
; CHECK-NEXT: punpklo p2.h, p0.b		; CHECK-NEXT: uunpklo z2.d, z3.s
; CHECK-NEXT: punpkhi p0.h, p0.b		; CHECK-NEXT: uunpkhi z3.d, z3.s
; CHECK-NEXT: ld1h { z1.d }, p1/z, [x0, #1, mul vl]
; CHECK-NEXT: ld1h { z2.d }, p2/z, [x0, #2, mul vl]
; CHECK-NEXT: ld1h { z3.d }, p0/z, [x0, #3, mul vl]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>* %a, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)		%load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>* %a, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
%ext = zext <vscale x 8 x i16> %load to <vscale x 8 x i64>		%ext = zext <vscale x 8 x i16> %load to <vscale x 8 x i64>
ret <vscale x 8 x i64> %ext		ret <vscale x 8 x i64> %ext
}		}

; Masked load requires promotion		; Masked load requires promotion
define <vscale x 2 x double> @masked_zload_2i16_2f64(<vscale x 2 x i16>* noalias %in, <vscale x 2 x i1> %mask) {		define <vscale x 2 x double> @masked_zload_2i16_2f64(<vscale x 2 x i16>* noalias %in, <vscale x 2 x i1> %mask) {
; CHECK-LABEL: masked_zload_2i16_2f64:		; CHECK-LABEL: masked_zload_2i16_2f64:
Show All 17 Lines