This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
9/9
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
22/22
AArch64ISelDAGToDAG.cpp
3/3
AArch64SVEInstrInfo.td
2/2
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/2
sve-gep.ll
8/8
sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
1/1
sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
-
sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll
-
sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll
15/15
sve-vscale-combine.ll

Differential D74254

[llvm][aarch64] SVE addressing modes.
ClosedPublic

Authored by fpetrogalli on Feb 7 2020, 1:35 PM.

Download Raw Diff

Details

Reviewers

andwar
efriedma
sdesmalen

Commits

rGe2ed1d14d6c2: [llvm][aarch64] SVE addressing modes.

Summary

Added register + immediate and register + register addressing modes for the following intrinsics:

Masked load and stores:
- Sign and zero extended load and truncated stores.
- No extension or truncation.
Masked non-temporal load and store.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 46354
Build 48770: arc lint + arc unit

Event Timeline

fpetrogalli created this revision.Feb 7 2020, 1:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 1:35 PM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

sdesmalen added a subscriber: sdesmalen.Feb 10 2020, 8:30 AM

The patch now covers all the LLVM intrinsic listed in the summary:

Masked load and stores:
- Sign and zero extended load and truncated stores.
- No extension or truncation.
Non-temporal load and store.

Notice that this patch is based (and depends) on https://reviews.llvm.org/D73602.

fpetrogalli added reviewers: andwar, efriedma.Feb 11 2020, 9:02 PM

Hi @fpetrogalli, thank you for working on this.

IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.
[nit] AFAIK, + is never used in file names (e.g. sve-pred-non-temporal-ldst-addressing-mode-reg+reg.ll). I couldn't find any specific rule for that though.
[nit] The idea of %sv2i1 = type <vscale x 2 x i1> is very neat, but I'm slightly worried that that will complicate searching through test files (we are quite familiar with <vscale x 2 x i1>, not so much with %sv2i1, so you might miss cases when using grep).
Are the DAG Combine rules required for this patch? If so, could you add tests for them?

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4420	[nit] Wouldn't assert be more suitable?
4439	`Base` and `OffImm` are only only needed when building in Debug mode and running e.g. `llc` with `-debug`. Maybe move that inside `LLVM_DEBUG`?
4441	[nit] `Match found` is very generic (and you use it twice). Maybe sth more specific?
4454	Used only once (so no need for a dedicated variable)
4455	Not used.
4464	Hm, why? Maybe document this method like you did for `SelectAddrModeIndexedSVE`?
llvm/lib/Target/AArch64/SVEInstrFormats.td
6982	Inconsistent naming with the records that follow this one.
llvm/test/CodeGen/AArch64/sve-gep.ll
7–8	Unrelated changes.

cameron.mcinally added a subscriber: cameron.mcinally.Feb 12 2020, 7:12 AM

@andwar , thank you for the review.

I'll update the patch asap.

IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.

I see your point, but the tests that use merge and store at the same time are using exactly the same addressing modes, it is not that they are using something different. So if something fails in the addressing mode of the load, it fails in the addressing mode of the store. Having them merged together saves quite some typing, and has no disadvantages in term of unit testing.

[nit] AFAIK, + is never used in file names (e.g. sve-pred-non-temporal-ldst-addressing-mode-reg+reg.ll). I couldn't find any specific rule for that though.

Fair point, I'll remove the plus.

[nit] The idea of %sv2i1 = type <vscale x 2 x i1> is very neat, but I'm slightly worried that that will complicate searching through test files (we are quite familiar with <vscale x 2 x i1>, not so much with %sv2i1, so you might miss cases when using grep).

Very good point. The type redefinition saved me a lot of typing, but now that the tests are there it is better o remove it.

Are the DAG Combine rules required for this patch? If so, could you add tests for them?

They are necessary for the reg+imm tests. They are needed to make sure that the ADD node of the base address is always in the format (ADD %BASE (VSCALE CONST)). I'll see if I can add some unit tests specifically for the combiner changes.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1305	This is to be removed.
1318	This will be removed too.
llvm/test/CodeGen/AArch64/sve-gep.ll
7–8	No, I had to fix this because of the DAG combine changes.

fpetrogalli added inline comments.Feb 12 2020, 8:14 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4439	No, they are output parameters, passed by reference. They are not just for debug purpose.
4441	I will remove these debug messages, they are not very helpful.

fpetrogalli marked 10 inline comments as done.Feb 12 2020, 1:29 PM

fpetrogalli added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4464	Good catch, it is not needed.
llvm/lib/Target/AArch64/SVEInstrFormats.td
6982	Used the file convention (lower case, separated with `_`, starts with `am`).

fpetrogalli added a parent revision: D73602: [SVE] Add support for lowering GEPs involving scalable vectors..Feb 12 2020, 1:33 PM

Address code review from @andwar.

I have added the tests for the DAG Combine changes.

Harbormaster completed remote builds in B46353: Diff 244268.Feb 12 2020, 1:38 PM

I have uploaded PDF renderings of the DAGS before and after the changes. I am not sure the tests for the combiner are explicit about the fact that the combiner is run on this pattern, but the graphs attached here show the effect of my changes on the code. I think that some of the patterns that are needed to show the changes in the output code are missing.

before.dag.combine_sub_vscale_i64-bdd206.dot.pdf22 KBDownload

after.dag.combine_sub_vscale_i64-7f807e.dot.pdf22 KBDownload

before.dag.combine_mul_vscale_i64-f4259d.dot.pdf21 KBDownload

before.dag.combine_add_vscale_i64-62fd59.dot.pdf20 KBDownload

after.dag.combine_mul_vscale_i64-975c2a.dot.pdf20 KBDownload

after.dag.combine_add_vscale_i64-35445c.dot.pdf20 KBDownload

fpetrogalli added a comment.Feb 12 2020, 1:43 PM

This comment was removed by fpetrogalli.

Cosmetic changes in the comments of the non-temporal reg+reg test.

Harbormaster completed remote builds in B46354: Diff 244273.Feb 12 2020, 1:56 PM

In D74254#1872363, @fpetrogalli wrote:

IMHO every test functions (e.g.:test_masked_ldst_sv2i8 ) should either test contiguous load or store (i.e. only one thing at a time). That will help triaging potential bugs in the future and also would be consistent with other test files in this folder.

I see your point, but the tests that use merge and store at the same time are using exactly the same addressing modes, it is not that they are using something different. So if something fails in the addressing mode of the load, it fails in the addressing mode of the store. Having them merged together saves quite some typing, and has no disadvantages in term of unit testing.

Ah, I've realised that you split the files per addressing modes. As the name of the patch would suggest :-) OK, keep it as it is.

Thank you for adding DAG diagrams - they are very helpful!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2330	`vscale * C1` and `vscale * (C0 + C1)`?
3260	I don't quite understand the benefit of this transformation. The selection dag before and after are almost identical.
3601	`vscale * (C0 + C1)`?
7730	`vscale * (C0 * C1)`?
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4447	Missing doxstring.
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
11	`; CHECK-NEXT`? Here and in the following examples.
16	Could you format this line (and similar lines elsewhere)? E.g.: https://github.com/llvm/llvm-project/blob/318d0ede572080f18d0106dbc354e11c88329a84/llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll#L11 Makes it easier to parse for humans :) And will be consistent with other files too!
llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
6	`vscale * C1`? and `vscale * (C0 + C1)`?
9	Perhaps `; CHECK-NOT: add`?
25	`vscale * (C0 * C1)`? What's `C0` and what is `C1` in this example?
33	[nit] Align with the following line (missing space)
38	Is `nounwind` needed in these examples?
55	Perhaps `; CHECK-NOT: sub`?
77	`; CHECK-NOT: shl`?
80	Hm, since it's `6` here, shouldn't line 77 be `; CHECK-NEXT: rdvl x0, #6`?

fpetrogalli marked 21 inline comments as done.Feb 13 2020, 11:21 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3260	It is needed to avoid the having to deal with SUB when computing the addressing mode.
3601	Almost: `vscale * (C0 * C1)`.
7730	Almost: `(vscale * (C0 << C1))`.
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
16	I prefer keeping all parameters in one line. The "consistency with other file" argument doesn't work, most of the tests for SVE uses the convention in this patch: frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.)$" sve.ll \| grep -v "()" \| wc -l 2076 frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.,$" sve.ll \| grep -v "()" \| wc -l 1433 The result becomes even more unbalanced if you look at the totality of tests in the folder: frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.)$" .ll \| grep -v "()" \| wc -l 7260 frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.,$" .ll \| grep -v "()" \| wc -l 1435 I run grep on master, @ `a062a3ed7fd82c277812d80fb83dc6f05b939a84`, _not_ on my dev branch :).
llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
38	Yes, to be able to use CHECK-NEXT after CHECK-LABEL
80	At IR level, it is %shl = 2^6 * VSCALE At Assembly level, the output of RDVL is 2^4 * VSCALE Hence, to compute %shl we need to multiply RDVL output by 2^2 -> #4 is correct. Does that make sense? I have actually added it as a comment, but I have modified the code so that it produces rdvl #1.

fpetrogalli marked 6 inline comments as done.Feb 13 2020, 11:23 AM

@andwar, thank you for you review.

I have updated the code accordingly.

Thank you!

fpetrogalli edited the summary of this revision. (Show Details)Feb 13 2020, 11:28 AM

fpetrogalli edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B46436: Diff 244486.Feb 13 2020, 11:29 AM

I have added a CHECK-NOT: mul that was missing from the DAG combiner tests.

Harbormaster completed remote builds in B46438: Diff 244499.Feb 13 2020, 12:06 PM

fpetrogalli added a child revision: D74581: [llvm][CodeGen][aarch64] Add contiguous prefetch intrinsics for SVE..Feb 13 2020, 7:17 PM

andwar added inline comments.Feb 17 2020, 11:13 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
700	DELETEME
4435	[nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified: if ((MulImm % MemWidthBytes) != 0) return false; signed Offset = MulImm / MemWidthBytes; if ((Offset < Min) \|\| (Offset > Max)) return false; Base = N.getOperand(0); OffImm = CurDAG->getTargetConstant(Offset, SDLoc(N), MVT::i64); return true; This way we have fewer levels of indentation.
4436	This operation mixes sizes: `unsigned` vs `int64_t`. It would be safer to be either: consistently explicit about the size of integers (`int64_t` and `int32_t`), or consistently implicit about the size of integers (`unsigned`, `unsigned long`)
4468	[nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified as: if (RHS.getOpcode() != ISD::SHL) return false; const SDValue SRHS = RHS.getOperand(1); auto *C = dyn_cast<ConstantSDNode>(SRHS); if (nullptr == C) return false; const uint64_t Shift = C->getZExtValue(); if (Shift == Scale) { Base = LHS; Offset = RHS.getOperand(0); return true; } This way you we have fewer levels of indentation.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1211	[nit] Indent consistently
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
16	I think that your comparison misses the context, and that's entirely my fault because I didn't make it clear. If you compare long lines only, the split is roughly 50/50. I kindly asked for this update because these lines are wrapped by Phabricator (they don't fit on my screen and I don't see how to make Phabricator stop doing that). This makes reviewing on Phab a bit frustrating. Btw, I think that `awk` would be more fitting here ;-) #! /bin/evn bash TEST_FILES=( "llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll" "llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll" "llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll" "llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll") for test_file in "${TEST_FILES[@]}"; do awk -F',' '{ if ($1 ~ /call/) { # Counter number of character for indentation for (i = 1; i <= length($0); i++) { if (substr($0, i, 1) == "(") { lenght_col = i break } } # Split function call - 2 args if (NF == 2) { printf("%s,\n %s %s\n", $1, lenght_col - 3, "", $2) } # Split function call - 3 args if (NF == 3) { printf("%s,\n %s %s,\n %s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3) } # Split function call - 4 args if (NF == 4) { printf("%s,\n %s %s,\n %s %s,\n %s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3, lenght_col - 3, "", $4) } } else { # Not a call, not reformatting print $0 } }' ${test_file} > temp.ll mv temp.ll ${test_file} done
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll
392	FIXME
llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
6	What about `vscale * C1`?
14	Could `C0` and `C1` be different than 1?
29	Did you mean `C0 = 2`? And multiplication by `C0` seems to be missing - there's only multiplication by `32`. Shouldn't the IR look like this: %vscale = call i64 @llvm.vscale.i64() %mul_by_16 = mul i64 %vscale, 16 %mul_by_2 = mul i64 %mul_by_16, 32 ret i64 %mul_by_2

Cheers for updating this @fpetrogalli ! My most recent comments are mostly nits.

Thank you @adwar. I have addressed your comments.

Your awk script was very useful.

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
16	You win!
llvm/test/CodeGen/AArch64/sve-vscale-combine.ll
14	Yes, but to get C0 and C1 different from 1 I'd have to use a `mul` instruction, which is already tested in the `combine_mul_vscale_*` tests. I think it is enough to test with C0=C1=1.
29	When targeting SVE, `@llvm.vscale.*()` returns the number of 16-byte chunks of and SVE register. If I multiply it by 32, it means it is returning the number of 4-bit (half-byte) elements in the SVE register. `RDVL` returns the number of 8-bit lanes in the register, hence the number of 4-bit lanes is given by `RDVL Xn, %2`. I have updated the comment setting C1 = 32.

Harbormaster completed remote builds in B46718: Diff 245180.Feb 18 2020, 9:03 AM

Thanks for creating this patch @fpetrogalli!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2331	Does it make sense to submit these DAGCombine changes + all related tests in a separate patch? This patch is a bit of three patches in one at the moment: DAGCombines, reg+reg addressing modes, reg+imm addressing modes.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
224	nit: This one is missing a comment.
4438	Are you purposely using `signed` instead of `int64_t` here?
4439	nit: unnecessary brackets, can be `if (Offset < Min \|\| Offset > Max)`
4471	nit: Why `const`?

fpetrogalli added a parent revision: D74782: [llvm][CodeGen] DAG Combiner folds for vscale..Feb 18 2020, 11:34 AM

I have extracted the DAGCombiner changes in a separate patch as
requested by @sdesmalen: https://reviews.llvm.org/D74782

fpetrogalli edited the summary of this revision. (Show Details)Feb 18 2020, 11:38 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2331	I have extracted the DAGCombiner changes here: https://reviews.llvm.org/D74782 I think that the code for the addressing mode is simple enough that it can stay in a single patch.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4438	Should be int64_t for consistency.
4471	I try to use `const` indiscriminately when I know that the variable is not supposed to change inside the scope. The same it is done in other places in this file (not everywhere). I didn't marry this, so if you want I can remove it. :)

Harbormaster completed remote builds in B46738: Diff 245224.Feb 18 2020, 11:39 AM

Thank you @sdesmalen for your review!

Francesco

Thank you for all the updates @fpetrogalli ! LGTM

LGTM!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4472	if (auto *C = dyn_cast<ConstantSDNode>(ShiftRHS)) { if (ShiftAmount == C->getZExtValue()) { ... return true; } } return false;
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251	nit: did you chose a different alignment here on purpose?

This revision is now accepted and ready to land.Feb 21 2020, 9:58 AM

fpetrogalli marked 3 inline comments as done.Feb 21 2020, 11:34 AM

fpetrogalli added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
4472	@sdesmalen, your version saves some lines, I'll apply it and then submit the patch. Thank you!
llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251	No - it is not on purpose. Do you want me to set everything to 1? I wouldn't bother, non of the code added in this patch care of the value of the alignment...

fpetrogalli marked 2 inline comments as done.Feb 21 2020, 11:59 AM

fpetrogalli added inline comments.

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll
251	Well, it was easier to change everything to `i32 1` instead of doing another round of questions! :)

Update code as requested by @sdesmalen.

Thank you!

Francesco

fpetrogalli marked an inline comment as done.Feb 21 2020, 12:01 PM

Harbormaster completed remote builds in B47040: Diff 245950.Feb 21 2020, 12:04 PM

Closed by commit rGe2ed1d14d6c2: [llvm][aarch64] SVE addressing modes. (authored by fpetrogalli). · Explain WhyFeb 21 2020, 12:04 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

30 lines

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

78 lines

AArch64SVEInstrInfo.td

118 lines

SVEInstrFormats.td

8 lines

test/

CodeGen/

AArch64/

sve-gep.ll

4 lines

sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll

462 lines

sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll

460 lines

sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll

143 lines

sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

124 lines

sve-vscale-combine.ll

91 lines

Diff 244273

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,321 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitADD(SDNode *N) {
if (SDValue V = foldAddSubOfSignBit(N, DAG))		if (SDValue V = foldAddSubOfSignBit(N, DAG))
return V;		return V;

// fold (a+b) -> (a\|b) iff a and b share no bits.		// fold (a+b) -> (a\|b) iff a and b share no bits.
if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::OR, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::OR, VT)) &&
DAG.haveNoCommonBitsSet(N0, N1))		DAG.haveNoCommonBitsSet(N0, N1))
return DAG.getNode(ISD::OR, DL, VT, N0, N1);		return DAG.getNode(ISD::OR, DL, VT, N0, N1);

		// Fold (add (vscale * C0), (vscale C1)) to (vscale C0 + C1))
		andwarUnsubmitted Done Reply Inline Actions `vscale * C1` and `vscale * (C0 + C1)`? andwar: `vscale * C1` and `vscale * (C0 + C1)`?
		if (N0.getOpcode() == ISD::VSCALE && N1.getOpcode() == ISD::VSCALE) {
		sdesmalenUnsubmitted Done Reply Inline Actions Does it make sense to submit these DAGCombine changes + all related tests in a separate patch? This patch is a bit of three patches in one at the moment: DAGCombines, reg+reg addressing modes, reg+imm addressing modes. sdesmalen: Does it make sense to submit these DAGCombine changes + all related tests in a separate patch?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I have extracted the DAGCombiner changes here: https://reviews.llvm.org/D74782 I think that the code for the addressing mode is simple enough that it can stay in a single patch. fpetrogalli: I have extracted the DAGCombiner changes here: https://reviews.llvm.org/D74782 I think that…
		APInt C0 = N0->getConstantOperandAPInt(0);
		APInt C1 = N1->getConstantOperandAPInt(0);
		return DAG.getVScale(DL, VT, C0 + C1);
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitADDSAT(SDNode *N) {		SDValue DAGCombiner::visitADDSAT(SDNode *N) {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
▲ Show 20 Lines • Show All 907 Lines • ▼ Show 20 Lines	if (N1.getOpcode() == ISD::SIGN_EXTEND_INREG) {
VTSDNode *TN = cast<VTSDNode>(N1.getOperand(1));		VTSDNode *TN = cast<VTSDNode>(N1.getOperand(1));
if (TN->getVT() == MVT::i1) {		if (TN->getVT() == MVT::i1) {
SDValue ZExt = DAG.getNode(ISD::AND, DL, VT, N1.getOperand(0),		SDValue ZExt = DAG.getNode(ISD::AND, DL, VT, N1.getOperand(0),
DAG.getConstant(1, DL, VT));		DAG.getConstant(1, DL, VT));
return DAG.getNode(ISD::ADD, DL, VT, N0, ZExt);		return DAG.getNode(ISD::ADD, DL, VT, N0, ZExt);
}		}
}		}

		// canonicalize (sub X, (vscale * C)) to (add X, (vscale * -C))
		andwarUnsubmitted Done Reply Inline Actions I don't quite understand the benefit of this transformation. The selection dag before and after are almost identical. andwar: I don't quite understand the benefit of this transformation. The selection dag before and after…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions It is needed to avoid the having to deal with SUB when computing the addressing mode. fpetrogalli: It is needed to avoid the having to deal with SUB when computing the addressing mode.
		if (N1.getOpcode() == ISD::VSCALE) {
		APInt IntVal = N1.getConstantOperandAPInt(0);
		return DAG.getNode(ISD::ADD, DL, VT, N0, DAG.getVScale(DL, VT, -IntVal));
		}

// Prefer an add for more folding potential and possibly better codegen:		// Prefer an add for more folding potential and possibly better codegen:
// sub N0, (lshr N10, width-1) --> add N0, (ashr N10, width-1)		// sub N0, (lshr N10, width-1) --> add N0, (ashr N10, width-1)
if (!LegalOperations && N1.getOpcode() == ISD::SRL && N1.hasOneUse()) {		if (!LegalOperations && N1.getOpcode() == ISD::SRL && N1.hasOneUse()) {
SDValue ShAmt = N1.getOperand(1);		SDValue ShAmt = N1.getOperand(1);
ConstantSDNode *ShAmtC = isConstOrConstSplat(ShAmt);		ConstantSDNode *ShAmtC = isConstOrConstSplat(ShAmt);
if (ShAmtC &&		if (ShAmtC &&
ShAmtC->getAPIntValue() == (N1.getScalarValueSizeInBits() - 1)) {		ShAmtC->getAPIntValue() == (N1.getScalarValueSizeInBits() - 1)) {
SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, N1.getOperand(0), ShAmt);		SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, N1.getOperand(0), ShAmt);
▲ Show 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	if (DAG.isConstantIntBuildVectorOrConstantInt(N1) &&
DAG.isConstantIntBuildVectorOrConstantInt(N0.getOperand(1)) &&		DAG.isConstantIntBuildVectorOrConstantInt(N0.getOperand(1)) &&
isMulAddWithConstProfitable(N, N0, N1))		isMulAddWithConstProfitable(N, N0, N1))
return DAG.getNode(ISD::ADD, SDLoc(N), VT,		return DAG.getNode(ISD::ADD, SDLoc(N), VT,
DAG.getNode(ISD::MUL, SDLoc(N0), VT,		DAG.getNode(ISD::MUL, SDLoc(N0), VT,
N0.getOperand(0), N1),		N0.getOperand(0), N1),
DAG.getNode(ISD::MUL, SDLoc(N1), VT,		DAG.getNode(ISD::MUL, SDLoc(N1), VT,
N0.getOperand(1), N1));		N0.getOperand(1), N1));

		// Fold (mul (vscale * C0), C1) to (vscale C0 * C1))
		andwarUnsubmitted Done Reply Inline Actions `vscale * (C0 + C1)`? andwar: `vscale * (C0 + C1)`?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Almost: `vscale * (C0 * C1)`. fpetrogalli: Almost: `vscale * (C0 * C1)`.
		if (N0.getOpcode() == ISD::VSCALE)
		if (ConstantSDNode *NC1 = isConstOrConstSplat(N1)) {
		APInt C0 = N0.getConstantOperandAPInt(0);
		APInt C1 = NC1->getAPIntValue();
		return DAG.getVScale(SDLoc(N), VT, C0 * C1);
		}

// reassociate mul		// reassociate mul
if (SDValue RMUL = reassociateOps(ISD::MUL, SDLoc(N), N0, N1, N->getFlags()))		if (SDValue RMUL = reassociateOps(ISD::MUL, SDLoc(N), N0, N1, N->getFlags()))
return RMUL;		return RMUL;

return SDValue();		return SDValue();
}		}

/// Return true if divmod libcall is available.		/// Return true if divmod libcall is available.
▲ Show 20 Lines • Show All 4,105 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::MUL && N0.getNode()->hasOneUse() &&
if (isConstantOrConstantVector(Shl))		if (isConstantOrConstantVector(Shl))
return DAG.getNode(ISD::MUL, SDLoc(N), VT, N0.getOperand(0), Shl);		return DAG.getNode(ISD::MUL, SDLoc(N), VT, N0.getOperand(0), Shl);
}		}

if (N1C && !N1C->isOpaque())		if (N1C && !N1C->isOpaque())
if (SDValue NewSHL = visitShiftByConstant(N))		if (SDValue NewSHL = visitShiftByConstant(N))
return NewSHL;		return NewSHL;

		// Fold (shl (vscale * C0), C1) to (vscale C0 << C1))
		andwarUnsubmitted Done Reply Inline Actions `vscale * (C0 * C1)`? andwar: `vscale * (C0 * C1)`?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Almost: `(vscale * (C0 << C1))`. fpetrogalli: Almost: `(vscale * (C0 << C1))`.
		if (N0.getOpcode() == ISD::VSCALE)
		if (ConstantSDNode *NC1 = isConstOrConstSplat(N->getOperand(1))) {
		auto DL = SDLoc(N);
		APInt C0 = N0.getConstantOperandAPInt(0);
		APInt C1 = NC1->getAPIntValue();
		return DAG.getVScale(DL, VT, C0 << C1);
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSRA(SDNode *N) {		SDValue DAGCombiner::visitSRA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if (SDValue V = DAG.simplifyShift(N0, N1))		if (SDValue V = DAG.simplifyShift(N0, N1))
return V;		return V;
▲ Show 20 Lines • Show All 13,658 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	public:
void SelectLoad(SDNode *N, unsigned NumVecs, unsigned Opc,		void SelectLoad(SDNode *N, unsigned NumVecs, unsigned Opc,
unsigned SubRegIdx);		unsigned SubRegIdx);
void SelectPostLoad(SDNode *N, unsigned NumVecs, unsigned Opc,		void SelectPostLoad(SDNode *N, unsigned NumVecs, unsigned Opc,
unsigned SubRegIdx);		unsigned SubRegIdx);
void SelectLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);
void SelectPostLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectPostLoadLane(SDNode *N, unsigned NumVecs, unsigned Opc);

bool SelectAddrModeFrameIndexSVE(SDValue N, SDValue &Base, SDValue &OffImm);		bool SelectAddrModeFrameIndexSVE(SDValue N, SDValue &Base, SDValue &OffImm);
		template <int64_t Min, int64_t Max>
		sdesmalenUnsubmitted Done Reply Inline Actions nit: This one is missing a comment. sdesmalen: nit: This one is missing a comment.
		bool SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, SDValue &Base,
		SDValue &OffImm);
		/// SVE Reg+Reg address mode
		template <unsigned Scale>
		bool SelectSVERegRegAddrMode(SDValue N, SDValue &Base, SDValue &Offset) {
		return SelectSVERegRegAddrMode(N, Scale, Base, Offset);
		}

void SelectStore(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectStore(SDNode *N, unsigned NumVecs, unsigned Opc);
void SelectPostStore(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectPostStore(SDNode *N, unsigned NumVecs, unsigned Opc);
void SelectStoreLane(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectStoreLane(SDNode *N, unsigned NumVecs, unsigned Opc);
void SelectPostStoreLane(SDNode *N, unsigned NumVecs, unsigned Opc);		void SelectPostStoreLane(SDNode *N, unsigned NumVecs, unsigned Opc);

bool tryBitfieldExtractOp(SDNode *N);		bool tryBitfieldExtractOp(SDNode *N);
bool tryBitfieldExtractOpFromSExt(SDNode *N);		bool tryBitfieldExtractOpFromSExt(SDNode *N);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	private:

bool SelectSVEAddSubImm(SDValue N, MVT VT, SDValue &Imm, SDValue &Shift);		bool SelectSVEAddSubImm(SDValue N, MVT VT, SDValue &Imm, SDValue &Shift);

bool SelectSVELogicalImm(SDValue N, MVT VT, SDValue &Imm);		bool SelectSVELogicalImm(SDValue N, MVT VT, SDValue &Imm);

bool SelectSVESignedArithImm(SDValue N, SDValue &Imm);		bool SelectSVESignedArithImm(SDValue N, SDValue &Imm);

bool SelectSVEArithImm(SDValue N, SDValue &Imm);		bool SelectSVEArithImm(SDValue N, SDValue &Imm);
		bool SelectSVERegRegAddrMode(SDValue N, unsigned Scale, SDValue &Base,
		SDValue &Offset);
};		};
} // end anonymous namespace		} // end anonymous namespace

/// isIntImmediate - This method tests to see if the node is a constant		/// isIntImmediate - This method tests to see if the node is a constant
/// operand. If so Imm will receive the 32-bit value.		/// operand. If so Imm will receive the 32-bit value.
static bool isIntImmediate(const SDNode *N, uint64_t &Imm) {		static bool isIntImmediate(const SDNode *N, uint64_t &Imm) {
if (const ConstantSDNode *C = dyn_cast<const ConstantSDNode>(N)) {		if (const ConstantSDNode *C = dyn_cast<const ConstantSDNode>(N)) {
Imm = C->getZExtValue();		Imm = C->getZExtValue();
▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	static SDValue narrowIfNeeded(SelectionDAG *CurDAG, SDValue N) {
MachineSDNode *Node = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,		MachineSDNode *Node = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
dl, MVT::i32, N, SubReg);		dl, MVT::i32, N, SubReg);
return SDValue(Node, 0);		return SDValue(Node, 0);
}		}

// Returns a suitable CNT/INC/DEC/RDVL multiplier to calculate VSCALE*N.		// Returns a suitable CNT/INC/DEC/RDVL multiplier to calculate VSCALE*N.
template<signed Low, signed High, signed Scale>		template<signed Low, signed High, signed Scale>
bool AArch64DAGToDAGISel::SelectRDVLImm(SDValue N, SDValue &Imm) {		bool AArch64DAGToDAGISel::SelectRDVLImm(SDValue N, SDValue &Imm) {
if (!isa<ConstantSDNode>(N))		if (!isa<ConstantSDNode>(N))
		andwarUnsubmitted Done Reply Inline Actions DELETEME andwar: DELETEME
return false;		return false;

int64_t MulImm = cast<ConstantSDNode>(N)->getSExtValue();		int64_t MulImm = cast<ConstantSDNode>(N)->getSExtValue();
if ((MulImm % std::abs(Scale)) == 0) {		if ((MulImm % std::abs(Scale)) == 0) {
int64_t RDVLImm = MulImm / Scale;		int64_t RDVLImm = MulImm / Scale;
if ((RDVLImm >= Low) && (RDVLImm <= High)) {		if ((RDVLImm >= Low) && (RDVLImm <= High)) {
Imm = CurDAG->getTargetConstant(RDVLImm, SDLoc(N), MVT::i32);		Imm = CurDAG->getTargetConstant(RDVLImm, SDLoc(N), MVT::i32);
return true;		return true;
▲ Show 20 Lines • Show All 3,695 Lines • ▼ Show 20 Lines
}		}

/// createAArch64ISelDag - This pass converts a legalized DAG into a		/// createAArch64ISelDag - This pass converts a legalized DAG into a
/// AArch64-specific DAG, ready for instruction scheduling.		/// AArch64-specific DAG, ready for instruction scheduling.
FunctionPass *llvm::createAArch64ISelDag(AArch64TargetMachine &TM,		FunctionPass *llvm::createAArch64ISelDag(AArch64TargetMachine &TM,
CodeGenOpt::Level OptLevel) {		CodeGenOpt::Level OptLevel) {
return new AArch64DAGToDAGISel(TM, OptLevel);		return new AArch64DAGToDAGISel(TM, OptLevel);
}		}

		/// SelectAddrModeIndexedSVE - Attempt selection of the addressing mode:
		/// Base + OffImm * sizeof(MemVT) for Min >= OffImm <= Max
		/// where Root is the memory access using N for its address.
		template <int64_t Min, int64_t Max>
		bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
		SDValue &Base,
		SDValue &OffImm) {
		assert(isa<MemSDNode>(Root) && "Invalid node.");
		andwarUnsubmitted Done Reply Inline Actions [nit] Wouldn't assert be more suitable? andwar: [nit] Wouldn't assert be more suitable?

		EVT MemVT = cast<MemSDNode>(Root)->getMemoryVT();

		if (N.getOpcode() != ISD::ADD)
		return false;

		SDValue VScale = N.getOperand(1);
		if (VScale.getOpcode() != ISD::VSCALE)
		return false;

		TypeSize TS = MemVT.getSizeInBits();
		unsigned MemWidthBytes = TS.getKnownMinSize() / 8;
		int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();

		if ((MulImm % MemWidthBytes) == 0) {
		andwarUnsubmitted Done Reply Inline Actions [nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified: if ((MulImm % MemWidthBytes) != 0) return false; signed Offset = MulImm / MemWidthBytes; if ((Offset < Min) \|\| (Offset > Max)) return false; Base = N.getOperand(0); OffImm = CurDAG->getTargetConstant(Offset, SDLoc(N), MVT::i64); return true; This way we have fewer levels of indentation. andwar: [nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to…
		signed Offset = MulImm / MemWidthBytes;
		andwarUnsubmitted Done Reply Inline Actions This operation mixes sizes: `unsigned` vs `int64_t`. It would be safer to be either: consistently explicit about the size of integers (`int64_t` and `int32_t`), or consistently implicit about the size of integers (`unsigned`, `unsigned long`) andwar: This operation mixes sizes: `unsigned` vs `int64_t`. It would be safer to be either: *…
		if ((Offset >= Min) && (Offset <= Max)) {
		Base = N.getOperand(0);
		sdesmalenUnsubmitted Done Reply Inline Actions Are you purposely using `signed` instead of `int64_t` here? sdesmalen: Are you purposely using `signed` instead of `int64_t` here?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Should be int64_t for consistency. fpetrogalli: Should be int64_t for consistency.
		OffImm = CurDAG->getTargetConstant(Offset, SDLoc(N), MVT::i64);
		andwarUnsubmitted Done Reply Inline Actions `Base` and `OffImm` are only only needed when building in Debug mode and running e.g. `llc` with `-debug`. Maybe move that inside `LLVM_DEBUG`? andwar: `Base` and `OffImm` are only only needed when building in Debug mode and running e.g. `llc`…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions No, they are output parameters, passed by reference. They are not just for debug purpose. fpetrogalli: No, they are output parameters, passed by reference. They are not just for debug purpose.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unnecessary brackets, can be `if (Offset < Min \|\| Offset > Max)` sdesmalen: nit: unnecessary brackets, can be `if (Offset < Min \|\| Offset > Max)`
		return true;
		}
		andwarUnsubmitted Done Reply Inline Actions [nit] `Match found` is very generic (and you use it twice). Maybe sth more specific? andwar: [nit] `Match found` is very generic (and you use it twice). Maybe sth more specific?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I will remove these debug messages, they are not very helpful. fpetrogalli: I will remove these debug messages, they are not very helpful.
		}

		return false;
		}

		bool AArch64DAGToDAGISel::SelectSVERegRegAddrMode(SDValue N, unsigned Scale,
		andwarUnsubmitted Done Reply Inline Actions Missing doxstring. andwar: Missing doxstring.
		SDValue &Base,
		SDValue &Offset) {
		if (N.getOpcode() != ISD::ADD)
		return false;

		// Process an ADD node.
		const SDValue LHS = N.getOperand(0);
		andwarUnsubmitted Done Reply Inline Actions Used only once (so no need for a dedicated variable) andwar: Used only once (so no need for a dedicated variable)
		const SDValue RHS = N.getOperand(1);
		andwarUnsubmitted Done Reply Inline Actions Not used. andwar: Not used.

		// 8 bit data does not come with the SHL node, so it is treated
		// separately.
		if (Scale == 0) {
		Base = LHS;
		Offset = RHS;
		return true;
		}

		andwarUnsubmitted Done Reply Inline Actions Hm, why? Maybe document this method like you did for `SelectAddrModeIndexedSVE`? andwar: Hm, why? Maybe document this method like you did for `SelectAddrModeIndexedSVE`?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Good catch, it is not needed. fpetrogalli: Good catch, it is not needed.
		// Check if the RHS is a shift node with a constant.
		if (RHS.getOpcode() == ISD::SHL) {
		const SDValue SRHS = RHS.getOperand(1);
		if (auto C = dyn_cast<ConstantSDNode>(SRHS)) {
		andwarUnsubmitted Done Reply Inline Actions [nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code, this could be simplified as: if (RHS.getOpcode() != ISD::SHL) return false; const SDValue SRHS = RHS.getOperand(1); auto C = dyn_cast<ConstantSDNode>(SRHS); if (nullptr == C) return false; const uint64_t Shift = C->getZExtValue(); if (Shift == Scale) { Base = LHS; Offset = RHS.getOperand(0); return true; } This way you we have fewer levels of indentation. andwar:* [nit] As per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to…
		const uint64_t Shift = C->getZExtValue();
		if (Shift == Scale) {
		Base = LHS;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Why `const`? sdesmalen: nit: Why `const`?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I try to use `const` indiscriminately when I know that the variable is not supposed to change inside the scope. The same it is done in other places in this file (not everywhere). I didn't marry this, so if you want I can remove it. :) fpetrogalli: I try to use `const` indiscriminately when I know that the variable is not supposed to change…
		Offset = RHS.getOperand(0);
		sdesmalenUnsubmitted Done Reply Inline Actions if (auto C = dyn_cast<ConstantSDNode>(ShiftRHS)) { if (ShiftAmount == C->getZExtValue()) { ... return true; } } return false; sdesmalen:* ```if (auto *C = dyn_cast<ConstantSDNode>(ShiftRHS)) { if (ShiftAmount == C->getZExtValue())…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions @sdesmalen, your version saves some lines, I'll apply it and then submit the patch. Thank you! fpetrogalli: @sdesmalen, your version saves some lines, I'll apply it and then submit the patch. Thank you!
		return true;
		}
		}
		}

		return false;
		}

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

	Show First 20 Lines • Show All 1,137 Lines • ▼ Show 20 Lines
	// General case that we ideally never want to match.			// General case that we ideally never want to match.
	def : Pat<(vscale GPR64:$scale), (MADDXrrr (UBFMXri (RDVLI_XI 1), 4, 63), $scale, XZR)>;			def : Pat<(vscale GPR64:$scale), (MADDXrrr (UBFMXri (RDVLI_XI 1), 4, 63), $scale, XZR)>;

	let AddedComplexity = 5 in {			let AddedComplexity = 5 in {
	def : Pat<(vscale (i64 1)), (UBFMXri (RDVLI_XI 1), 4, 63)>;			def : Pat<(vscale (i64 1)), (UBFMXri (RDVLI_XI 1), 4, 63)>;
	def : Pat<(vscale (i64 -1)), (SBFMXri (RDVLI_XI -1), 4, 63)>;			def : Pat<(vscale (i64 -1)), (SBFMXri (RDVLI_XI -1), 4, 63)>;

	def : Pat<(vscale (sve_rdvl_imm i32:$imm)), (RDVLI_XI $imm)>;			def : Pat<(vscale (sve_rdvl_imm i32:$imm)), (RDVLI_XI $imm)>;
				def : Pat<(vscale (sve_rdvl_imm i32:$imm)), (RDVLI_XI $imm)>;
	def : Pat<(vscale (sve_cnth_imm i32:$imm)), (CNTH_XPiI 31, $imm)>;			def : Pat<(vscale (sve_cnth_imm i32:$imm)), (CNTH_XPiI 31, $imm)>;
	def : Pat<(vscale (sve_cntw_imm i32:$imm)), (CNTW_XPiI 31, $imm)>;			def : Pat<(vscale (sve_cntw_imm i32:$imm)), (CNTW_XPiI 31, $imm)>;
	def : Pat<(vscale (sve_cntd_imm i32:$imm)), (CNTD_XPiI 31, $imm)>;			def : Pat<(vscale (sve_cntd_imm i32:$imm)), (CNTD_XPiI 31, $imm)>;

	def : Pat<(vscale (sve_cnth_imm_neg i32:$imm)), (SUBXrs XZR, (CNTH_XPiI 31, $imm), 0)>;			def : Pat<(vscale (sve_cnth_imm_neg i32:$imm)), (SUBXrs XZR, (CNTH_XPiI 31, $imm), 0)>;
	def : Pat<(vscale (sve_cntw_imm_neg i32:$imm)), (SUBXrs XZR, (CNTW_XPiI 31, $imm), 0)>;			def : Pat<(vscale (sve_cntw_imm_neg i32:$imm)), (SUBXrs XZR, (CNTW_XPiI 31, $imm), 0)>;
	def : Pat<(vscale (sve_cntd_imm_neg i32:$imm)), (SUBXrs XZR, (CNTD_XPiI 31, $imm), 0)>;			def : Pat<(vscale (sve_cntd_imm_neg i32:$imm)), (SUBXrs XZR, (CNTD_XPiI 31, $imm), 0)>;
	}			}
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	def : Pat<(nxv2f64 (bitconvert (nxv8i16 ZPR:$src))), (nxv2f64 ZPR:$src)>;			def : Pat<(nxv2f64 (bitconvert (nxv8i16 ZPR:$src))), (nxv2f64 ZPR:$src)>;
	def : Pat<(nxv2f64 (bitconvert (nxv4i32 ZPR:$src))), (nxv2f64 ZPR:$src)>;			def : Pat<(nxv2f64 (bitconvert (nxv4i32 ZPR:$src))), (nxv2f64 ZPR:$src)>;
	def : Pat<(nxv2f64 (bitconvert (nxv2i64 ZPR:$src))), (nxv2f64 ZPR:$src)>;			def : Pat<(nxv2f64 (bitconvert (nxv2i64 ZPR:$src))), (nxv2f64 ZPR:$src)>;
	def : Pat<(nxv2f64 (bitconvert (nxv8f16 ZPR:$src))), (nxv2f64 ZPR:$src)>;			def : Pat<(nxv2f64 (bitconvert (nxv8f16 ZPR:$src))), (nxv2f64 ZPR:$src)>;
	def : Pat<(nxv2f64 (bitconvert (nxv4f32 ZPR:$src))), (nxv2f64 ZPR:$src)>;			def : Pat<(nxv2f64 (bitconvert (nxv4f32 ZPR:$src))), (nxv2f64 ZPR:$src)>;

	// Add more complex addressing modes here as required			// Add more complex addressing modes here as required
	multiclass pred_load<ValueType Ty, ValueType PredTy, SDPatternOperator Load,			multiclass pred_load<ValueType Ty, ValueType PredTy, SDPatternOperator Load,
	Instruction RegImmInst> {			Instruction RegRegInst, Instruction RegImmInst, ComplexPattern AddrCP> {
				// reg + reg
				let AddedComplexity = 1 in {
				def _reg_reg_z : Pat<(Ty (Load (AddrCP GPR64:$base, GPR64:$offset), (PredTy PPR:$gp), (SVEDup0Undef))),
				(RegRegInst PPR:$gp, GPR64:$base, GPR64:$offset)>;
				andwarUnsubmitted Done Reply Inline Actions [nit] Indent consistently andwar: [nit] Indent consistently
				}
				// reg + imm
				let AddedComplexity = 2 in {
				def _reg_imm_z : Pat<(Ty (Load (am_sve_indexed_s4 GPR64sp:$base, simm4s1:$offset), (PredTy PPR:$gp), (SVEDup0Undef))),
				(RegImmInst PPR:$gp, GPR64:$base, simm4s1:$offset)>;
				}
	def _default_z : Pat<(Ty (Load GPR64:$base, (PredTy PPR:$gp), (SVEDup0Undef))),			def _default_z : Pat<(Ty (Load GPR64:$base, (PredTy PPR:$gp), (SVEDup0Undef))),
	(RegImmInst PPR:$gp, GPR64:$base, (i64 0))>;			(RegImmInst PPR:$gp, GPR64:$base, (i64 0))>;
	}			}

	// 2-element contiguous loads			// 2-element contiguous loads
	defm : pred_load<nxv2i64, nxv2i1, zext_masked_load_i8, LD1B_D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, zext_masked_load_i8, LD1B_D, LD1B_D_IMM, am_sve_regreg_lsl0>;
	defm : pred_load<nxv2i64, nxv2i1, asext_masked_load_i8, LD1SB_D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, asext_masked_load_i8, LD1SB_D, LD1SB_D_IMM, am_sve_regreg_lsl0>;
	defm : pred_load<nxv2i64, nxv2i1, zext_masked_load_i16, LD1H_D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, zext_masked_load_i16, LD1H_D, LD1H_D_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv2i64, nxv2i1, asext_masked_load_i16, LD1SH_D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, asext_masked_load_i16, LD1SH_D, LD1SH_D_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv2i64, nxv2i1, zext_masked_load_i32, LD1W_D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, zext_masked_load_i32, LD1W_D, LD1W_D_IMM, am_sve_regreg_lsl2>;
	defm : pred_load<nxv2i64, nxv2i1, asext_masked_load_i32, LD1SW_D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, asext_masked_load_i32, LD1SW_D, LD1SW_D_IMM, am_sve_regreg_lsl2>;
	defm : pred_load<nxv2i64, nxv2i1, nonext_masked_load, LD1D_IMM>;			defm : pred_load<nxv2i64, nxv2i1, nonext_masked_load, LD1D, LD1D_IMM, am_sve_regreg_lsl3>;
	defm : pred_load<nxv2f16, nxv2i1, nonext_masked_load, LD1H_D_IMM>;			defm : pred_load<nxv2f16, nxv2i1, nonext_masked_load, LD1H_D, LD1H_D_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv2f32, nxv2i1, nonext_masked_load, LD1W_D_IMM>;			defm : pred_load<nxv2f32, nxv2i1, nonext_masked_load, LD1W_D, LD1W_D_IMM, am_sve_regreg_lsl2>;
	defm : pred_load<nxv2f64, nxv2i1, nonext_masked_load, LD1D_IMM>;			defm : pred_load<nxv2f64, nxv2i1, nonext_masked_load, LD1D, LD1D_IMM, am_sve_regreg_lsl3>;

	// 4-element contiguous loads			// 4-element contiguous loads
	defm : pred_load<nxv4i32, nxv4i1, zext_masked_load_i8, LD1B_S_IMM>;			defm : pred_load<nxv4i32, nxv4i1, zext_masked_load_i8, LD1B_S, LD1B_S_IMM, am_sve_regreg_lsl0>;
	defm : pred_load<nxv4i32, nxv4i1, asext_masked_load_i8, LD1SB_S_IMM>;			defm : pred_load<nxv4i32, nxv4i1, asext_masked_load_i8, LD1SB_S, LD1SB_S_IMM, am_sve_regreg_lsl0>;
	defm : pred_load<nxv4i32, nxv4i1, zext_masked_load_i16, LD1H_S_IMM>;			defm : pred_load<nxv4i32, nxv4i1, zext_masked_load_i16, LD1H_S, LD1H_S_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv4i32, nxv4i1, asext_masked_load_i16, LD1SH_S_IMM>;			defm : pred_load<nxv4i32, nxv4i1, asext_masked_load_i16, LD1SH_S, LD1SH_S_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv4i32, nxv4i1, nonext_masked_load, LD1W_IMM>;			defm : pred_load<nxv4i32, nxv4i1, nonext_masked_load, LD1W, LD1W_IMM, am_sve_regreg_lsl2>;
	defm : pred_load<nxv4f16, nxv4i1, nonext_masked_load, LD1H_S_IMM>;			defm : pred_load<nxv4f16, nxv4i1, nonext_masked_load, LD1H_S, LD1H_S_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv4f32, nxv4i1, nonext_masked_load, LD1W_IMM>;			defm : pred_load<nxv4f32, nxv4i1, nonext_masked_load, LD1W, LD1W_IMM, am_sve_regreg_lsl2>;

	// 8-element contiguous loads			// 8-element contiguous loads
	defm : pred_load<nxv8i16, nxv8i1, zext_masked_load_i8, LD1B_H_IMM>;			defm : pred_load<nxv8i16, nxv8i1, zext_masked_load_i8, LD1B_H, LD1B_H_IMM, am_sve_regreg_lsl0>;
	defm : pred_load<nxv8i16, nxv8i1, asext_masked_load_i8, LD1SB_H_IMM>;			defm : pred_load<nxv8i16, nxv8i1, asext_masked_load_i8, LD1SB_H, LD1SB_H_IMM, am_sve_regreg_lsl0>;
	defm : pred_load<nxv8i16, nxv8i1, nonext_masked_load, LD1H_IMM>;			defm : pred_load<nxv8i16, nxv8i1, nonext_masked_load, LD1H, LD1H_IMM, am_sve_regreg_lsl1>;
	defm : pred_load<nxv8f16, nxv8i1, nonext_masked_load, LD1H_IMM>;			defm : pred_load<nxv8f16, nxv8i1, nonext_masked_load, LD1H, LD1H_IMM, am_sve_regreg_lsl1>;

	// 16-element contiguous loads			// 16-element contiguous loads
	defm : pred_load<nxv16i8, nxv16i1, nonext_masked_load, LD1B_IMM>;			defm : pred_load<nxv16i8, nxv16i1, nonext_masked_load, LD1B, LD1B_IMM, am_sve_regreg_lsl0>;

	multiclass pred_store<ValueType Ty, ValueType PredTy, SDPatternOperator Store,			multiclass pred_store<ValueType Ty, ValueType PredTy, SDPatternOperator Store,
	Instruction RegImmInst> {			Instruction RegRegInst, Instruction RegImmInst, ComplexPattern AddrCP> {
				// reg + reg
				let AddedComplexity = 1 in {
				def _reg_reg : Pat<(Store (Ty ZPR:$vec), (AddrCP GPR64:$base, GPR64:$offset), (PredTy PPR:$gp)),
				(RegRegInst ZPR:$vec, PPR:$gp, GPR64:$base, GPR64:$offset)>;
				}
				// reg + imm
				let AddedComplexity = 2 in {
				def _reg_imm : Pat<(Store (Ty ZPR:$vec), (am_sve_indexed_s4 GPR64sp:$base, simm4s1:$offset), (PredTy PPR:$gp)),
				(RegImmInst ZPR:$vec, PPR:$gp, GPR64:$base, simm4s1:$offset)>;
				}
	def _default : Pat<(Store (Ty ZPR:$vec), GPR64:$base, (PredTy PPR:$gp)),			def _default : Pat<(Store (Ty ZPR:$vec), GPR64:$base, (PredTy PPR:$gp)),
	(RegImmInst ZPR:$vec, PPR:$gp, GPR64:$base, (i64 0))>;			(RegImmInst ZPR:$vec, PPR:$gp, GPR64:$base, (i64 0))>;
	}			}

	// 2-element contiguous stores			// 2-element contiguous stores
	defm : pred_store<nxv2i64, nxv2i1, trunc_masked_store_i8, ST1B_D_IMM>;			defm : pred_store<nxv2i64, nxv2i1, trunc_masked_store_i8, ST1B_D, ST1B_D_IMM, am_sve_regreg_lsl0>;
	defm : pred_store<nxv2i64, nxv2i1, trunc_masked_store_i16, ST1H_D_IMM>;			defm : pred_store<nxv2i64, nxv2i1, trunc_masked_store_i16, ST1H_D, ST1H_D_IMM, am_sve_regreg_lsl1>;
	defm : pred_store<nxv2i64, nxv2i1, trunc_masked_store_i32, ST1W_D_IMM>;			defm : pred_store<nxv2i64, nxv2i1, trunc_masked_store_i32, ST1W_D, ST1W_D_IMM, am_sve_regreg_lsl2>;
	defm : pred_store<nxv2i64, nxv2i1, nontrunc_masked_store, ST1D_IMM>;			defm : pred_store<nxv2i64, nxv2i1, nontrunc_masked_store, ST1D, ST1D_IMM, am_sve_regreg_lsl3>;
	defm : pred_store<nxv2f16, nxv2i1, nontrunc_masked_store, ST1H_D_IMM>;			defm : pred_store<nxv2f16, nxv2i1, nontrunc_masked_store, ST1H_D, ST1H_D_IMM, am_sve_regreg_lsl1>;
	defm : pred_store<nxv2f32, nxv2i1, nontrunc_masked_store, ST1W_D_IMM>;			defm : pred_store<nxv2f32, nxv2i1, nontrunc_masked_store, ST1W_D, ST1W_D_IMM, am_sve_regreg_lsl2>;
	defm : pred_store<nxv2f64, nxv2i1, nontrunc_masked_store, ST1D_IMM>;			defm : pred_store<nxv2f64, nxv2i1, nontrunc_masked_store, ST1D, ST1D_IMM, am_sve_regreg_lsl3>;

	// 4-element contiguous stores			// 4-element contiguous stores
	defm : pred_store<nxv4i32, nxv4i1, trunc_masked_store_i8, ST1B_S_IMM>;			defm : pred_store<nxv4i32, nxv4i1, trunc_masked_store_i8, ST1B_S, ST1B_S_IMM, am_sve_regreg_lsl0>;
	defm : pred_store<nxv4i32, nxv4i1, trunc_masked_store_i16, ST1H_S_IMM>;			defm : pred_store<nxv4i32, nxv4i1, trunc_masked_store_i16, ST1H_S, ST1H_S_IMM, am_sve_regreg_lsl1>;
	defm : pred_store<nxv4i32, nxv4i1, nontrunc_masked_store, ST1W_IMM>;			defm : pred_store<nxv4i32, nxv4i1, nontrunc_masked_store, ST1W, ST1W_IMM, am_sve_regreg_lsl2>;
	defm : pred_store<nxv4f16, nxv4i1, nontrunc_masked_store, ST1H_S_IMM>;			defm : pred_store<nxv4f16, nxv4i1, nontrunc_masked_store, ST1H_S, ST1H_S_IMM, am_sve_regreg_lsl1>;
	defm : pred_store<nxv4f32, nxv4i1, nontrunc_masked_store, ST1W_IMM>;			defm : pred_store<nxv4f32, nxv4i1, nontrunc_masked_store, ST1W, ST1W_IMM, am_sve_regreg_lsl2>;

	// 8-element contiguous stores			// 8-element contiguous stores
	defm : pred_store<nxv8i16, nxv8i1, trunc_masked_store_i8, ST1B_H_IMM>;			defm : pred_store<nxv8i16, nxv8i1, trunc_masked_store_i8, ST1B_H, ST1B_H_IMM, am_sve_regreg_lsl0>;
	defm : pred_store<nxv8i16, nxv8i1, nontrunc_masked_store, ST1H_IMM>;			defm : pred_store<nxv8i16, nxv8i1, nontrunc_masked_store, ST1H, ST1H_IMM, am_sve_regreg_lsl1>;
	defm : pred_store<nxv8f16, nxv8i1, nontrunc_masked_store, ST1H_IMM>;			defm : pred_store<nxv8f16, nxv8i1, nontrunc_masked_store, ST1H, ST1H_IMM, am_sve_regreg_lsl1>;

	// 16-element contiguous stores			// 16-element contiguous stores
	defm : pred_store<nxv16i8, nxv16i1, nontrunc_masked_store, ST1B_IMM>;			defm : pred_store<nxv16i8, nxv16i1, nontrunc_masked_store, ST1B, ST1B_IMM, am_sve_regreg_lsl0>;

	defm : pred_load<nxv16i8, nxv16i1, non_temporal_load, LDNT1B_ZRI>;			defm : pred_load<nxv16i8, nxv16i1, non_temporal_load, LDNT1B_ZRR, LDNT1B_ZRI, am_sve_regreg_lsl0>;
	defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRI>;			defm : pred_load<nxv8i16, nxv8i1, non_temporal_load, LDNT1H_ZRR, LDNT1H_ZRI, am_sve_regreg_lsl1>;
	defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRI>;			defm : pred_load<nxv4i32, nxv4i1, non_temporal_load, LDNT1W_ZRR, LDNT1W_ZRI, am_sve_regreg_lsl2>;
	defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRI>;			defm : pred_load<nxv2i64, nxv2i1, non_temporal_load, LDNT1D_ZRR, LDNT1D_ZRI, am_sve_regreg_lsl3>;

	defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRI>;			defm : pred_store<nxv16i8, nxv16i1, non_temporal_store, STNT1B_ZRR, STNT1B_ZRI, am_sve_regreg_lsl0>;
	defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRI>;			defm : pred_store<nxv8i16, nxv8i1, non_temporal_store, STNT1H_ZRR, STNT1H_ZRI, am_sve_regreg_lsl1>;
	defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRI>;			defm : pred_store<nxv4i32, nxv4i1, non_temporal_store, STNT1W_ZRR, STNT1W_ZRI, am_sve_regreg_lsl2>;
	defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRI>;			defm : pred_store<nxv2i64, nxv2i1, non_temporal_store, STNT1D_ZRR, STNT1D_ZRI, am_sve_regreg_lsl3>;

	multiclass unpred_store<ValueType Ty, Instruction RegImmInst, Instruction PTrue> {			multiclass unpred_store<ValueType Ty, Instruction RegImmInst, Instruction PTrue> {
	def _fi : Pat<(store (Ty ZPR:$val), (am_sve_fi GPR64sp:$base, simm4s1:$offset)),			def _fi : Pat<(store (Ty ZPR:$val), (am_sve_fi GPR64sp:$base, simm4s1:$offset)),
	(RegImmInst ZPR:$val, (PTrue 31), GPR64sp:$base, simm4s1:$offset)>;			(RegImmInst ZPR:$val, (PTrue 31), GPR64sp:$base, simm4s1:$offset)>;
	}			}
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions This is to be removed. fpetrogalli: This is to be removed.

	defm Pat_ST1B : unpred_store<nxv16i8, ST1B_IMM, PTRUE_B>;			defm Pat_ST1B : unpred_store<nxv16i8, ST1B_IMM, PTRUE_B>;
	defm Pat_ST1H : unpred_store<nxv8i16, ST1H_IMM, PTRUE_H>;			defm Pat_ST1H : unpred_store<nxv8i16, ST1H_IMM, PTRUE_H>;
	defm Pat_ST1W : unpred_store<nxv4i32, ST1W_IMM, PTRUE_S>;			defm Pat_ST1W : unpred_store<nxv4i32, ST1W_IMM, PTRUE_S>;
	defm Pat_ST1D : unpred_store<nxv2i64, ST1D_IMM, PTRUE_D>;			defm Pat_ST1D : unpred_store<nxv2i64, ST1D_IMM, PTRUE_D>;
	defm Pat_ST1H_float16: unpred_store<nxv8f16, ST1H_IMM, PTRUE_H>;			defm Pat_ST1H_float16: unpred_store<nxv8f16, ST1H_IMM, PTRUE_H>;
	defm Pat_ST1W_float : unpred_store<nxv4f32, ST1W_IMM, PTRUE_S>;			defm Pat_ST1W_float : unpred_store<nxv4f32, ST1W_IMM, PTRUE_S>;
	defm Pat_ST1D_double : unpred_store<nxv2f64, ST1D_IMM, PTRUE_D>;			defm Pat_ST1D_double : unpred_store<nxv2f64, ST1D_IMM, PTRUE_D>;

	multiclass unpred_load<ValueType Ty, Instruction RegImmInst, Instruction PTrue> {			multiclass unpred_load<ValueType Ty, Instruction RegImmInst, Instruction PTrue> {
	def _fi : Pat<(Ty (load (am_sve_fi GPR64sp:$base, simm4s1:$offset))),			def _fi : Pat<(Ty (load (am_sve_fi GPR64sp:$base, simm4s1:$offset))),
	(RegImmInst (PTrue 31), GPR64sp:$base, simm4s1:$offset)>;			(RegImmInst (PTrue 31), GPR64sp:$base, simm4s1:$offset)>;
	}			}
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions This will be removed too. fpetrogalli: This will be removed too.

	defm Pat_LD1B : unpred_load<nxv16i8, LD1B_IMM, PTRUE_B>;			defm Pat_LD1B : unpred_load<nxv16i8, LD1B_IMM, PTRUE_B>;
	defm Pat_LD1H : unpred_load<nxv8i16, LD1H_IMM, PTRUE_H>;			defm Pat_LD1H : unpred_load<nxv8i16, LD1H_IMM, PTRUE_H>;
	defm Pat_LD1W : unpred_load<nxv4i32, LD1W_IMM, PTRUE_S>;			defm Pat_LD1W : unpred_load<nxv4i32, LD1W_IMM, PTRUE_S>;
	defm Pat_LD1D : unpred_load<nxv2i64, LD1D_IMM, PTRUE_D>;			defm Pat_LD1D : unpred_load<nxv2i64, LD1D_IMM, PTRUE_D>;
	defm Pat_LD1H_float16: unpred_load<nxv8f16, LD1H_IMM, PTRUE_H>;			defm Pat_LD1H_float16: unpred_load<nxv8f16, LD1H_IMM, PTRUE_H>;
	defm Pat_LD1W_float : unpred_load<nxv4f32, LD1W_IMM, PTRUE_S>;			defm Pat_LD1W_float : unpred_load<nxv4f32, LD1W_IMM, PTRUE_S>;
	defm Pat_LD1D_double : unpred_load<nxv2f64, LD1D_IMM, PTRUE_D>;			defm Pat_LD1D_double : unpred_load<nxv2f64, LD1D_IMM, PTRUE_D>;
	▲ Show 20 Lines • Show All 513 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,971 Lines • ▼ Show 20 Lines	: I<(outs ZPR8:$Zdn), (ins ZPR8:$_Zdn),
bits<5> Zdn;		bits<5> Zdn;
let Inst{31-11} = 0b010001010010000011100;		let Inst{31-11} = 0b010001010010000011100;
let Inst{10} = opc;		let Inst{10} = opc;
let Inst{9-5} = 0b00000;		let Inst{9-5} = 0b00000;
let Inst{4-0} = Zdn;		let Inst{4-0} = Zdn;

let Constraints = "$Zdn = $_Zdn";		let Constraints = "$Zdn = $_Zdn";
}		}

		/// Addressing modes
		def am_sve_indexed_s4 :ComplexPattern<i64, 2, "SelectAddrModeIndexedSVE<-8,7>", [], [SDNPWantRoot]>;
		andwarUnsubmitted Done Reply Inline Actions Inconsistent naming with the records that follow this one. andwar: Inconsistent naming with the records that follow this one.
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Used the file convention (lower case, separated with `_`, starts with `am`). fpetrogalli: Used the file convention (lower case, separated with `_`, starts with `am`).

		def am_sve_regreg_lsl0 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<0>", []>;
		def am_sve_regreg_lsl1 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<1>", []>;
		def am_sve_regreg_lsl2 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<2>", []>;
		def am_sve_regreg_lsl3 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<3>", []>;

llvm/test/CodeGen/AArch64/sve-gep.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

	define <vscale x 2 x i64>* @scalar_of_scalable_1(<vscale x 2 x i64>* %base) {			define <vscale x 2 x i64>* @scalar_of_scalable_1(<vscale x 2 x i64>* %base) {
	; CHECK-LABEL: scalar_of_scalable_1:			; CHECK-LABEL: scalar_of_scalable_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: rdvl x8, #1			; CHECK-NEXT: rdvl x8, #4
	; CHECK-NEXT: add x0, x0, x8, lsl #2			; CHECK-NEXT: add x0, x0, x8
				andwarUnsubmitted Done Reply Inline Actions Unrelated changes. andwar: Unrelated changes.
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions No, I had to fix this because of the DAG combine changes. fpetrogalli: No, I had to fix this because of the DAG combine changes.
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%d = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, i64 4			%d = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, i64 4
	ret <vscale x 2 x i64>* %d			ret <vscale x 2 x i64>* %d
	}			}

	define <vscale x 2 x i64>* @scalar_of_scalable_2(<vscale x 2 x i64>* %base, i64 %offset) {			define <vscale x 2 x i64>* @scalar_of_scalable_2(<vscale x 2 x i64>* %base, i64 %offset) {
	; CHECK-LABEL: scalar_of_scalable_2:			; CHECK-LABEL: scalar_of_scalable_2:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s

				; Range checks: for all the instruction tested in this file, the
				; immediate must be within the range [-8, 7] (4-bit immediate). Out of
				; range values are tested only in one case (following). Valid values
				; are tested all through the rest of the file.

				define void @imm_out_of_range(<vscale x 2 x i64> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: imm_out_of_range:
				; CHECK: ld1d { z[[DATA:[0-9]+]].d }, p0/z, [x{{[0-9]+}}]
				; CHECK: st1d { z[[DATA]].d }, p0, [x{{[0-9]+}}]
				andwarUnsubmitted Done Reply Inline Actions `; CHECK-NEXT`? Here and in the following examples. andwar: `; CHECK-NEXT`? Here and in the following examples.
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, i64 8
				%data = call <vscale x 2 x i64> @llvm.masked.load.nxv2i64(<vscale x 2 x i64>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i64> undef)
				%base_store = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64> * %base, i64 -9
				call void @llvm.masked.store.nxv2i64(<vscale x 2 x i64> %data, <vscale x 2 x i64>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				andwarUnsubmitted Done Reply Inline Actions Could you format this line (and similar lines elsewhere)? E.g.: https://github.com/llvm/llvm-project/blob/318d0ede572080f18d0106dbc354e11c88329a84/llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll#L11 Makes it easier to parse for humans :) And will be consistent with other files too! andwar: Could you format this line (and similar lines elsewhere)? E.g.: https://github.com/llvm/llvm…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I prefer keeping all parameters in one line. The "consistency with other file" argument doesn't work, most of the tests for SVE uses the convention in this patch: frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.)$" sve.ll \| grep -v "()" \| wc -l 2076 frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.,$" sve.ll \| grep -v "()" \| wc -l 1433 The result becomes even more unbalanced if you look at the totality of tests in the folder: frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.)$" .ll \| grep -v "()" \| wc -l 7260 frapet01@man-08:~/projects/upstream-clang/llvm-project/llvm/test/CodeGen/AArch64 (master)$ grep "@llvm.,$" .ll \| grep -v "()" \| wc -l 1435 I run grep on master, @ `a062a3ed7fd82c277812d80fb83dc6f05b939a84`, _not_ on my dev branch :). fpetrogalli: I prefer keeping all parameters in one line. The "consistency with other file" argument…
				andwarUnsubmitted Done Reply Inline Actions I think that your comparison misses the context, and that's entirely my fault because I didn't make it clear. If you compare long lines only, the split is roughly 50/50. I kindly asked for this update because these lines are wrapped by Phabricator (they don't fit on my screen and I don't see how to make Phabricator stop doing that). This makes reviewing on Phab a bit frustrating. Btw, I think that `awk` would be more fitting here ;-) #! /bin/evn bash TEST_FILES=( "llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll" "llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll" "llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll" "llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll") for test_file in "${TEST_FILES[@]}"; do awk -F',' '{ if ($1 ~ /call/) { # Counter number of character for indentation for (i = 1; i <= length($0); i++) { if (substr($0, i, 1) == "(") { lenght_col = i break } } # Split function call - 2 args if (NF == 2) { printf("%s,\n %s %s\n", $1, lenght_col - 3, "", $2) } # Split function call - 3 args if (NF == 3) { printf("%s,\n %s %s,\n %s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3) } # Split function call - 4 args if (NF == 4) { printf("%s,\n %s %s,\n %s %s,\n %s %s\n", $1, lenght_col - 3, "", $2, lenght_col - 3, "", $3, lenght_col - 3, "", $4) } } else { # Not a call, not reformatting print $0 } }' ${test_file} > temp.ll mv temp.ll ${test_file} done andwar: I think that your comparison misses the context, and that's entirely my fault because I didn't…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions You win! fpetrogalli: You win!
				ret void
				}

				; 2-lane contiguous load/stores

				define void @test_masked_ldst_sv2i8(<vscale x 2 x i8> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2i8:
				; CHECK: ld1sb { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: st1b { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i8>, <vscale x 2 x i8>* %base, i64 -8
				%data = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i8> undef)
				%base_store = getelementptr <vscale x 2 x i8>, <vscale x 2 x i8> * %base, i64 -7
				call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %data, <vscale x 2 x i8>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2i16(<vscale x 2 x i16> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2i16:
				; CHECK: ld1sh { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: st1h { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i16>, <vscale x 2 x i16>* %base, i64 -8
				%data = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
				%base_store = getelementptr <vscale x 2 x i16>, <vscale x 2 x i16> * %base, i64 -7
				call void @llvm.masked.store.nxv2i16(<vscale x 2 x i16> %data, <vscale x 2 x i16>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}


				define void @test_masked_ldst_sv2i32(<vscale x 2 x i32> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2i32:
				; CHECK: ld1sw { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: st1w { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i32>, <vscale x 2 x i32>* %base, i64 -8
				%data = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
				%base_store = getelementptr <vscale x 2 x i32>, <vscale x 2 x i32> * %base, i64 -7
				call void @llvm.masked.store.nxv2i32(<vscale x 2 x i32> %data, <vscale x 2 x i32>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2i64(<vscale x 2 x i64> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2i64:
				; CHECK: ld1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: st1d { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, i64 -8
				%data = call <vscale x 2 x i64> @llvm.masked.load.nxv2i64(<vscale x 2 x i64>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i64> undef)
				%base_store = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64> * %base, i64 -7
				call void @llvm.masked.store.nxv2i64(<vscale x 2 x i64> %data, <vscale x 2 x i64>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2f16(<vscale x 2 x half> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2f16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: st1h { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x half>, <vscale x 2 x half>* %base, i64 -8
				%data = call <vscale x 2 x half> @llvm.masked.load.nxv2f16(<vscale x 2 x half>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x half> undef)
				%base_store = getelementptr <vscale x 2 x half>, <vscale x 2 x half> * %base, i64 -7
				call void @llvm.masked.store.nxv2f16(<vscale x 2 x half> %data, <vscale x 2 x half>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}


				define void @test_masked_ldst_sv2f32(<vscale x 2 x float> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2f32:
				; CHECK: ld1w { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: st1w { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x float>, <vscale x 2 x float>* %base, i64 -8
				%data = call <vscale x 2 x float> @llvm.masked.load.nxv2f32(<vscale x 2 x float>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x float> undef)
				%base_store = getelementptr <vscale x 2 x float>, <vscale x 2 x float> * %base, i64 -7
				call void @llvm.masked.store.nxv2f32(<vscale x 2 x float> %data, <vscale x 2 x float>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2f64(<vscale x 2 x double> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2f64:
				; CHECK: ld1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-6, mul vl]
				; CHECK: st1d { z[[DATA]].d }, p0, [x0, #-5, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %base, i64 -6
				%data = call <vscale x 2 x double> @llvm.masked.load.nxv2f64(<vscale x 2 x double>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x double> undef)
				%base_store = getelementptr <vscale x 2 x double>, <vscale x 2 x double> * %base, i64 -5
				call void @llvm.masked.store.nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x double>* %base_store, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				; 2-lane zero/sign extended contiguous loads.

				define <vscale x 2 x i64> @masked_zload_sv2i8_to_sv2i64(<vscale x 2 x i8>* %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_zload_sv2i8_to_sv2i64:
				; CHECK: ld1b { z0.d }, p0/z, [x0, #-4, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i8>, <vscale x 2 x i8>* %base, i64 -4
				%load = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i8> undef)
				%ext = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_sload_sv2i8_to_sv2i64(<vscale x 2 x i8>* %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_sload_sv2i8_to_sv2i64:
				; CHECK: ld1sb { z0.d }, p0/z, [x0, #-3, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i8>, <vscale x 2 x i8>* %base, i64 -3
				%load = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i8> undef)
				%ext = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_zload_sv2i16_to_sv2i64(<vscale x 2 x i16>* %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_zload_sv2i16_to_sv2i64:
				; CHECK: ld1h { z0.d }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i16>, <vscale x 2 x i16>* %base, i64 1
				%load = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
				%ext = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_sload_sv2i16_to_sv2i64(<vscale x 2 x i16>* %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_sload_sv2i16_to_sv2i64:
				; CHECK: ld1sh { z0.d }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i16>, <vscale x 2 x i16>* %base, i64 2
				%load = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
				%ext = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_zload_sv2i32_to_sv2i64(<vscale x 2 x i32>* %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_zload_sv2i32_to_sv2i64:
				; CHECK: ld1w { z0.d }, p0/z, [x0, #-2, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i32>, <vscale x 2 x i32>* %base, i64 -2
				%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
				%ext = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_sload_sv2i32_to_sv2i64(<vscale x 2 x i32>* %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_sload_sv2i32_to_sv2i64:
				; CHECK: ld1sw { z0.d }, p0/z, [x0, #-1, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i32>, <vscale x 2 x i32>* %base, i64 -1
				%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %base_load, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
				%ext = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				; 2-lane truncating contiguous stores.

				define void @masked_trunc_store_sv2i64_to_sv2i8(<vscale x 2 x i64> %val, <vscale x 2 x i8> *%base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_trunc_store_sv2i64_to_sv2i8:
				; CHECK: st1b { z0.d }, p0, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i8>, <vscale x 2 x i8>* %base, i64 3
				%trunc = trunc <vscale x 2 x i64> %val to <vscale x 2 x i8>
				call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %trunc, <vscale x 2 x i8> *%base_load, i32 8, <vscale x 2 x i1> %mask)
				ret void
				}


				define void @masked_trunc_store_sv2i64_to_sv2i16(<vscale x 2 x i64> %val, <vscale x 2 x i16> *%base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_trunc_store_sv2i64_to_sv2i16:
				; CHECK: st1h { z0.d }, p0, [x0, #4, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i16>, <vscale x 2 x i16>* %base, i64 4
				%trunc = trunc <vscale x 2 x i64> %val to <vscale x 2 x i16>
				call void @llvm.masked.store.nxv2i16(<vscale x 2 x i16> %trunc, <vscale x 2 x i16> *%base_load, i32 8, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @masked_trunc_store_sv2i64_to_sv2i32(<vscale x 2 x i64> %val, <vscale x 2 x i32> *%base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_trunc_store_sv2i64_to_sv2i32:
				; CHECK: st1w { z0.d }, p0, [x0, #5, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 2 x i32>, <vscale x 2 x i32>* %base, i64 5
				%trunc = trunc <vscale x 2 x i64> %val to <vscale x 2 x i32>
				call void @llvm.masked.store.nxv2i32(<vscale x 2 x i32> %trunc, <vscale x 2 x i32> *%base_load, i32 8, <vscale x 2 x i1> %mask)
				ret void
				}

				; 4-lane contiguous load/stores.

				define void @test_masked_ldst_sv4i8(<vscale x 4 x i8> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4i8:
				; CHECK: ld1sb { z[[DATA:[0-9]+]].s }, p0/z, [x0, #-1, mul vl]
				; CHECK: st1b { z[[DATA]].s }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x i8>, <vscale x 4 x i8>* %base, i64 -1
				%data = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i8> undef)
				%base_store = getelementptr <vscale x 4 x i8>, <vscale x 4 x i8> * %base, i64 2
				call void @llvm.masked.store.nxv4i8(<vscale x 4 x i8> %data, <vscale x 4 x i8>* %base_store, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4i16(<vscale x 4 x i16> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4i16:
				; CHECK: ld1sh { z[[DATA:[0-9]+]].s }, p0/z, [x0, #-1, mul vl]
				; CHECK: st1h { z[[DATA]].s }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x i16>, <vscale x 4 x i16>* %base, i64 -1
				%data = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
				%base_store = getelementptr <vscale x 4 x i16>, <vscale x 4 x i16> * %base, i64 2
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %data, <vscale x 4 x i16>* %base_store, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4i32(<vscale x 4 x i32> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4i32:
				; CHECK: ld1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, #6, mul vl]
				; CHECK: st1w { z[[DATA]].s }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %base, i64 6
				%data = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32(<vscale x 4 x i32>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i32> undef)
				%base_store = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> * %base, i64 7
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i32>* %base_store, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4f16(<vscale x 4 x half> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4f16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].s }, p0/z, [x0, #-1, mul vl]
				; CHECK: st1h { z[[DATA]].s }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x half>, <vscale x 4 x half>* %base, i64 -1
				%data = call <vscale x 4 x half> @llvm.masked.load.nxv4f16(<vscale x 4 x half>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x half> undef)
				%base_store = getelementptr <vscale x 4 x half>, <vscale x 4 x half> * %base, i64 2
				call void @llvm.masked.store.nxv4f16(<vscale x 4 x half> %data, <vscale x 4 x half>* %base_store, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}
				sdesmalenUnsubmitted Done Reply Inline Actions nit: did you chose a different alignment here on purpose? sdesmalen: nit: did you chose a different alignment here on purpose?
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions No - it is not on purpose. Do you want me to set everything to 1? I wouldn't bother, non of the code added in this patch care of the value of the alignment... fpetrogalli: No - it is not on purpose. Do you want me to set everything to 1? I wouldn't bother, non of…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Well, it was easier to change everything to `i32 1` instead of doing another round of questions! :) fpetrogalli: Well, it was easier to change everything to `i32 1` instead of doing another round of questions!

				define void @test_masked_ldst_sv4f32(<vscale x 4 x float> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4f32:
				; CHECK: ld1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, #-1, mul vl]
				; CHECK: st1w { z[[DATA]].s }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x float>, <vscale x 4 x float>* %base, i64 -1
				%data = call <vscale x 4 x float> @llvm.masked.load.nxv4f32(<vscale x 4 x float>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x float> undef)
				%base_store = getelementptr <vscale x 4 x float>, <vscale x 4 x float> * %base, i64 2
				call void @llvm.masked.store.nxv4f32(<vscale x 4 x float> %data, <vscale x 4 x float>* %base_store, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				; 4-lane zero/sign extended contiguous loads.

				define <vscale x 4 x i32> @masked_zload_sv4i8_to_sv4i32(<vscale x 4 x i8>* %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_zload_sv4i8_to_sv4i32:
				; CHECK: ld1b { z0.s }, p0/z, [x0, #-4, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 4 x i8>, <vscale x 4 x i8>* %base, i64 -4
				%load = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i8> undef)
				%ext = zext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				define <vscale x 4 x i32> @masked_sload_sv4i8_to_sv4i32(<vscale x 4 x i8>* %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_sload_sv4i8_to_sv4i32:
				; CHECK: ld1sb { z0.s }, p0/z, [x0, #-3, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 4 x i8>, <vscale x 4 x i8>* %base, i64 -3
				%load = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i8> undef)
				%ext = sext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				define <vscale x 4 x i32> @masked_zload_sv4i16_to_sv4i32(<vscale x 4 x i16>* %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_zload_sv4i16_to_sv4i32:
				; CHECK: ld1h { z0.s }, p0/z, [x0, #1, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 4 x i16>, <vscale x 4 x i16>* %base, i64 1
				%load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
				%ext = zext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				define <vscale x 4 x i32> @masked_sload_sv4i16_to_sv4i32(<vscale x 4 x i16>* %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_sload_sv4i16_to_sv4i32:
				; CHECK: ld1sh { z0.s }, p0/z, [x0, #2, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 4 x i16>, <vscale x 4 x i16>* %base, i64 2
				%load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %base_load, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
				%ext = sext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				; 4-lane truncating contiguous stores.

				define void @masked_trunc_store_sv4i32_to_sv4i8(<vscale x 4 x i32> %val, <vscale x 4 x i8> *%base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_trunc_store_sv4i32_to_sv4i8:
				; CHECK: st1b { z0.s }, p0, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 4 x i8>, <vscale x 4 x i8>* %base, i64 3
				%trunc = trunc <vscale x 4 x i32> %val to <vscale x 4 x i8>
				call void @llvm.masked.store.nxv4i8(<vscale x 4 x i8> %trunc, <vscale x 4 x i8> *%base_load, i32 8, <vscale x 4 x i1> %mask)
				ret void
				}


				define void @masked_trunc_store_sv4i32_to_sv4i16(<vscale x 4 x i32> %val, <vscale x 4 x i16> *%base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: masked_trunc_store_sv4i32_to_sv4i16:
				; CHECK: st1h { z0.s }, p0, [x0, #4, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 4 x i16>, <vscale x 4 x i16>* %base, i64 4
				%trunc = trunc <vscale x 4 x i32> %val to <vscale x 4 x i16>
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %trunc, <vscale x 4 x i16> *%base_load, i32 8, <vscale x 4 x i1> %mask)
				ret void
				}

				; 8-lane contiguous load/stores.

				define void @test_masked_ldst_sv8i8(<vscale x 8 x i8> * %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv8i8:
				; CHECK: ld1sb { z[[DATA:[0-9]+]].h }, p0/z, [x0, #6, mul vl]
				; CHECK: st1b { z[[DATA]].h }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 8 x i8>, <vscale x 8 x i8>* %base, i64 6
				%data = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>* %base_load, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i8> undef)
				%base_store = getelementptr <vscale x 8 x i8>, <vscale x 8 x i8> * %base, i64 7
				call void @llvm.masked.store.nxv8i8(<vscale x 8 x i8> %data, <vscale x 8 x i8>* %base_store, i32 1, <vscale x 8 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv8i16(<vscale x 8 x i16> * %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv8i16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #6, mul vl]
				; CHECK: st1h { z[[DATA]].h }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16>* %base, i64 6
				%data = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>* %base_load, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
				%base_store = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16> * %base, i64 7
				call void @llvm.masked.store.nxv8i16(<vscale x 8 x i16> %data, <vscale x 8 x i16>* %base_store, i32 1, <vscale x 8 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv8f16(<vscale x 8 x half> * %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv8f16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #-1, mul vl]
				; CHECK: st1h { z[[DATA]].h }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 8 x half>, <vscale x 8 x half>* %base, i64 -1
				%data = call <vscale x 8 x half> @llvm.masked.load.nxv8f16(<vscale x 8 x half>* %base_load, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x half> undef)
				%base_store = getelementptr <vscale x 8 x half>, <vscale x 8 x half> * %base, i64 2
				call void @llvm.masked.store.nxv8f16(<vscale x 8 x half> %data, <vscale x 8 x half>* %base_store, i32 1, <vscale x 8 x i1> %mask)
				ret void
				}

				; 8-lane zero/sign extended contiguous loads.

				define <vscale x 8 x i16> @masked_zload_sv8i8_to_sv8i16(<vscale x 8 x i8>* %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_zload_sv8i8_to_sv8i16:
				; CHECK: ld1b { z0.h }, p0/z, [x0, #-4, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 8 x i8>, <vscale x 8 x i8>* %base, i64 -4
				%load = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>* %base_load, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i8> undef)
				%ext = zext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %ext
				}

				define <vscale x 8 x i16> @masked_sload_sv8i8_to_sv8i16(<vscale x 8 x i8>* %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_sload_sv8i8_to_sv8i16:
				; CHECK: ld1sb { z0.h }, p0/z, [x0, #-3, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 8 x i8>, <vscale x 8 x i8>* %base, i64 -3
				%load = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>* %base_load, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i8> undef)
				%ext = sext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %ext
				}

				; 8-lane truncating contiguous stores.

				define void @masked_trunc_store_sv8i16_to_sv8i8(<vscale x 8 x i16> %val, <vscale x 8 x i8> *%base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_trunc_store_sv8i16_to_sv8i8:
				; CHECK: st1b { z0.h }, p0, [x0, #3, mul vl]
				; CHECK-NEXT: ret
				%base_load = getelementptr <vscale x 8 x i8>, <vscale x 8 x i8>* %base, i64 3
				%trunc = trunc <vscale x 8 x i16> %val to <vscale x 8 x i8>
				call void @llvm.masked.store.nxv8i8(<vscale x 8 x i8> %trunc, <vscale x 8 x i8> *%base_load, i32 8, <vscale x 8 x i1> %mask)
				ret void
				}

				; 16-lane contiguous load/stores.

				define void @test_masked_ldst_sv16i8(<vscale x 16 x i8> * %base, <vscale x 16 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv16i8:
				; CHECK: ld1b { z[[DATA:[0-9]+]].b }, p0/z, [x0, #6, mul vl]
				; CHECK: st1b { z[[DATA]].b }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %base, i64 6
				%data = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8(<vscale x 16 x i8>* %base_load, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> undef)
				%base_store = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8> * %base, i64 7
				call void @llvm.masked.store.nxv16i8(<vscale x 16 x i8> %data, <vscale x 16 x i8>* %base_store, i32 1, <vscale x 16 x i1> %mask)
				ret void
				}

				; 2-element contiguous loads.
				declare <vscale x 2 x i8> @llvm.masked.load.nxv2i8 (<vscale x 2 x i8>* , i32, <vscale x 2 x i1>, <vscale x 2 x i8> )
				declare <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>*, i32, <vscale x 2 x i1>, <vscale x 2 x i16>)
				declare <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>*, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)
				declare <vscale x 2 x i64> @llvm.masked.load.nxv2i64(<vscale x 2 x i64>*, i32, <vscale x 2 x i1>, <vscale x 2 x i64>)
				declare <vscale x 2 x half> @llvm.masked.load.nxv2f16(<vscale x 2 x half>*, i32, <vscale x 2 x i1>, <vscale x 2 x half>)
				declare <vscale x 2 x float> @llvm.masked.load.nxv2f32(<vscale x 2 x float>*, i32, <vscale x 2 x i1>, <vscale x 2 x float>)
				declare <vscale x 2 x double> @llvm.masked.load.nxv2f64(<vscale x 2 x double>*, i32, <vscale x 2 x i1>, <vscale x 2 x double>)

				; 4-element contiguous loads.
				declare <vscale x 4 x i8> @llvm.masked.load.nxv4i8 (<vscale x 4 x i8>* , i32, <vscale x 4 x i1>, <vscale x 4 x i8> )
				declare <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>*, i32, <vscale x 4 x i1>, <vscale x 4 x i16>)
				declare <vscale x 4 x i32> @llvm.masked.load.nxv4i32(<vscale x 4 x i32>*, i32, <vscale x 4 x i1>, <vscale x 4 x i32>)
				declare <vscale x 4 x half> @llvm.masked.load.nxv4f16(<vscale x 4 x half>*, i32, <vscale x 4 x i1>, <vscale x 4 x half>)
				declare <vscale x 4 x float> @llvm.masked.load.nxv4f32(<vscale x 4 x float>*, i32, <vscale x 4 x i1>, <vscale x 4 x float>)

				; 8-element contiguous loads.
				declare <vscale x 8 x i8> @llvm.masked.load.nxv8i8 (<vscale x 8 x i8>* , i32, <vscale x 8 x i1>, <vscale x 8 x i8> )
				declare <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>*, i32, <vscale x 8 x i1>, <vscale x 8 x i16>)
				declare <vscale x 8 x half> @llvm.masked.load.nxv8f16(<vscale x 8 x half>*, i32, <vscale x 8 x i1>, <vscale x 8 x half>)

				; 16-element contiguous loads.
				declare <vscale x 16 x i8> @llvm.masked.load.nxv16i8(<vscale x 16 x i8>*, i32, <vscale x 16 x i1>, <vscale x 16 x i8>)

				; 2-element contiguous stores.
				declare void @llvm.masked.store.nxv2i8 (<vscale x 2 x i8> , <vscale x 2 x i8>* , i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2i16(<vscale x 2 x i16>, <vscale x 2 x i16>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2f16(<vscale x 2 x half>, <vscale x 2 x half>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>*, i32, <vscale x 2 x i1>)

				; 4-element contiguous stores.
				declare void @llvm.masked.store.nxv4i8 (<vscale x 4 x i8> , <vscale x 4 x i8>* , i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4f16(<vscale x 4 x half>, <vscale x 4 x half>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>*, i32, <vscale x 4 x i1>)

				; 8-element contiguous stores.
				declare void @llvm.masked.store.nxv8i8 (<vscale x 8 x i8> , <vscale x 8 x i8>* , i32, <vscale x 8 x i1>)
				declare void @llvm.masked.store.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>*, i32, <vscale x 8 x i1>)
				declare void @llvm.masked.store.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>*, i32, <vscale x 8 x i1>)

				; 16-element contiguous stores.
				declare void @llvm.masked.store.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>*, i32, <vscale x 16 x i1>)

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s

				; 2-lane contiguous load/stores

				define void @test_masked_ldst_sv2i8(i8 * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2i8:
				; CHECK: ld1sb { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1]
				; CHECK: st1b { z[[DATA]].d }, p0, [x0, x1]
				; CHECK: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 2 x i8>*
				%data = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i8> undef)
				call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %data, <vscale x 2 x i8>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2i16(i16 * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2i16:
				; CHECK: ld1sh { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1, lsl #1]
				; CHECK: st1h { z[[DATA]].d }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 2 x i16>*
				%data = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
				call void @llvm.masked.store.nxv2i16(<vscale x 2 x i16> %data, <vscale x 2 x i16>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2i32(i32 * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2i32:
				; CHECK: ld1sw { z0.d }, p0/z, [x0, x1, lsl #2]
				; CHECK: st1w { z0.d }, p0, [x0, x1, lsl #2]
				; CHECK: ret
				%base_i32 = getelementptr i32, i32* %base, i64 %offset
				%base_addr = bitcast i32* %base_i32 to <vscale x 2 x i32>*
				%data = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
				call void @llvm.masked.store.nxv2i32(<vscale x 2 x i32> %data, <vscale x 2 x i32>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2i64(i64 * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2i64:
				; CHECK: ld1d { z0.d }, p0/z, [x0, x1, lsl #3]
				; CHECK: st1d { z0.d }, p0, [x0, x1, lsl #3]
				; CHECK: ret
				%base_i64 = getelementptr i64, i64* %base, i64 %offset
				%base_addr = bitcast i64* %base_i64 to <vscale x 2 x i64>*
				%data = call <vscale x 2 x i64> @llvm.masked.load.nxv2i64(<vscale x 2 x i64>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i64> undef)
				call void @llvm.masked.store.nxv2i64(<vscale x 2 x i64> %data, <vscale x 2 x i64>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2f16(half * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2f16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1, lsl #1]
				; CHECK: st1h { z[[DATA]].d }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_half = getelementptr half, half* %base, i64 %offset
				%base_addr = bitcast half* %base_half to <vscale x 2 x half>*
				%data = call <vscale x 2 x half> @llvm.masked.load.nxv2f16(<vscale x 2 x half>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x half> undef)
				call void @llvm.masked.store.nxv2f16(<vscale x 2 x half> %data, <vscale x 2 x half>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2f32(float * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2f32:
				; CHECK: ld1w { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1, lsl #2]
				; CHECK: st1w { z[[DATA]].d }, p0, [x0, x1, lsl #2]
				; CHECK: ret
				%base_float = getelementptr float, float* %base, i64 %offset
				%base_addr = bitcast float* %base_float to <vscale x 2 x float>*
				%data = call <vscale x 2 x float> @llvm.masked.load.nxv2f32(<vscale x 2 x float>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x float> undef)
				call void @llvm.masked.store.nxv2f32(<vscale x 2 x float> %data, <vscale x 2 x float>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv2f64(double * %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2f64:
				; CHECK: ld1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1, lsl #3]
				; CHECK: st1d { z[[DATA]].d }, p0, [x0, x1, lsl #3]
				; CHECK: ret
				%base_double = getelementptr double, double* %base, i64 %offset
				%base_addr = bitcast double* %base_double to <vscale x 2 x double>*
				%data = call <vscale x 2 x double> @llvm.masked.load.nxv2f64(<vscale x 2 x double>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x double> undef)
				call void @llvm.masked.store.nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x double>* %base_addr, i32 1, <vscale x 2 x i1> %mask)
				ret void
				}

				; 2-lane zero/sign extended contiguous loads.

				define <vscale x 2 x i64> @masked_zload_sv2i8_to_sv2i64(i8* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_zload_sv2i8_to_sv2i64:
				; CHECK: ld1b { z0.d }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 2 x i8>*
				%load = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i8> undef)
				%ext = zext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_sload_sv2i8_to_sv2i64(i8* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_sload_sv2i8_to_sv2i64:
				; CHECK: ld1sb { z0.d }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 2 x i8>*
				%load = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i8> undef)
				%ext = sext <vscale x 2 x i8> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_zload_sv2i16_to_sv2i64(i16* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_zload_sv2i16_to_sv2i64:
				; CHECK: ld1h { z0.d }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 2 x i16>*
				%load = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
				%ext = zext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_sload_sv2i16_to_sv2i64(i16* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_sload_sv2i16_to_sv2i64:
				; CHECK: ld1sh { z0.d }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 2 x i16>*
				%load = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
				%ext = sext <vscale x 2 x i16> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}


				define <vscale x 2 x i64> @masked_zload_sv2i32_to_sv2i64(i32* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_zload_sv2i32_to_sv2i64:
				; CHECK: ld1w { z0.d }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%base_i32 = getelementptr i32, i32* %base, i64 %offset
				%base_addr = bitcast i32* %base_i32 to <vscale x 2 x i32>*
				%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
				%ext = zext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				define <vscale x 2 x i64> @masked_sload_sv2i32_to_sv2i64(i32* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_sload_sv2i32_to_sv2i64:
				; CHECK: ld1sw { z0.d }, p0/z, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%base_i32 = getelementptr i32, i32* %base, i64 %offset
				%base_addr = bitcast i32* %base_i32 to <vscale x 2 x i32>*
				%load = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>* %base_addr, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
				%ext = sext <vscale x 2 x i32> %load to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %ext
				}

				; 2-lane truncating contiguous stores.

				define void @masked_trunc_store_sv2i64_to_sv2i8(<vscale x 2 x i64> %val, i8 *%base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_trunc_store_sv2i64_to_sv2i8:
				; CHECK: st1b { z0.d }, p0, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 2 x i8>*
				%trunc = trunc <vscale x 2 x i64> %val to <vscale x 2 x i8>
				call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %trunc, <vscale x 2 x i8> *%base_addr, i32 8, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @masked_trunc_store_sv2i64_to_sv2i16(<vscale x 2 x i64> %val, i16 *%base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_trunc_store_sv2i64_to_sv2i16:
				; CHECK: st1h { z0.d }, p0, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 2 x i16>*
				%trunc = trunc <vscale x 2 x i64> %val to <vscale x 2 x i16>
				call void @llvm.masked.store.nxv2i16(<vscale x 2 x i16> %trunc, <vscale x 2 x i16> *%base_addr, i32 8, <vscale x 2 x i1> %mask)
				ret void
				}

				define void @masked_trunc_store_sv2i64_to_sv2i32(<vscale x 2 x i64> %val, i32 *%base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_trunc_store_sv2i64_to_sv2i32:
				; CHECK: st1w { z0.d }, p0, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				%base_i32 = getelementptr i32, i32* %base, i64 %offset
				%base_addr = bitcast i32* %base_i32 to <vscale x 2 x i32>*
				%trunc = trunc <vscale x 2 x i64> %val to <vscale x 2 x i32>
				call void @llvm.masked.store.nxv2i32(<vscale x 2 x i32> %trunc, <vscale x 2 x i32> *%base_addr, i32 8, <vscale x 2 x i1> %mask)
				ret void
				}

				; 4-lane contiguous load/stores.

				define void @test_masked_ldst_sv4i8(i8 * %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4i8:
				; CHECK: ld1sb { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1]
				; CHECK: st1b { z[[DATA]].s }, p0, [x0, x1]
				; CHECK: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 4 x i8>*
				%data = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i8> undef)
				call void @llvm.masked.store.nxv4i8(<vscale x 4 x i8> %data, <vscale x 4 x i8>* %base_addr, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4i16(i16 * %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4i16:
				; CHECK: ld1sh { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1, lsl #1]
				; CHECK: st1h { z[[DATA]].s }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 4 x i16>*
				%data = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %data, <vscale x 4 x i16>* %base_addr, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4i32(i32 * %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4i32:
				; CHECK: ld1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1, lsl #2]
				; CHECK: st1w { z[[DATA]].s }, p0, [x0, x1, lsl #2]
				; CHECK: ret
				%base_i32 = getelementptr i32, i32* %base, i64 %offset
				%base_addr = bitcast i32* %base_i32 to <vscale x 4 x i32>*
				%data = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32(<vscale x 4 x i32>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i32> undef)
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i32>* %base_addr, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4f16(half * %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4f16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1, lsl #1]
				; CHECK: st1h { z[[DATA]].s }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_f16 = getelementptr half, half* %base, i64 %offset
				%base_addr = bitcast half* %base_f16 to <vscale x 4 x half>*
				%data = call <vscale x 4 x half> @llvm.masked.load.nxv4f16(<vscale x 4 x half>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x half> undef)
				call void @llvm.masked.store.nxv4f16(<vscale x 4 x half> %data, <vscale x 4 x half>* %base_addr, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv4f32(float * %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4f32:
				; CHECK: ld1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1, lsl #2]
				; CHECK: st1w { z[[DATA]].s }, p0, [x0, x1, lsl #2]
				; CHECK: ret
				%base_f32 = getelementptr float, float* %base, i64 %offset
				%base_addr = bitcast float* %base_f32 to <vscale x 4 x float>*
				%data = call <vscale x 4 x float> @llvm.masked.load.nxv4f32(<vscale x 4 x float>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x float> undef)
				call void @llvm.masked.store.nxv4f32(<vscale x 4 x float> %data, <vscale x 4 x float>* %base_addr, i32 1, <vscale x 4 x i1> %mask)
				ret void
				}

				; 4-lane zero/sign extended contiguous loads.

				define <vscale x 4 x i32> @masked_zload_sv4i8_to_sv4i32(i8* %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_zload_sv4i8_to_sv4i32:
				; CHECK: ld1b { z0.s }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 4 x i8>*
				%load = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i8> undef)
				%ext = zext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				define <vscale x 4 x i32> @masked_sload_sv4i8_to_sv4i32(i8* %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_sload_sv4i8_to_sv4i32:
				; CHECK: ld1sb { z0.s }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 4 x i8>*
				%load = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i8> undef)
				%ext = sext <vscale x 4 x i8> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				define <vscale x 4 x i32> @masked_zload_sv4i16_to_sv4i32(i16* %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_zload_sv4i16_to_sv4i32:
				; CHECK: ld1h { z0.s }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 4 x i16>*
				%load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
				%ext = zext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				define <vscale x 4 x i32> @masked_sload_sv4i16_to_sv4i32(i16* %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_sload_sv4i16_to_sv4i32:
				; CHECK: ld1sh { z0.s }, p0/z, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 4 x i16>*
				%load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %base_addr, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
				%ext = sext <vscale x 4 x i16> %load to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %ext
				}

				; 4-lane truncating contiguous stores.

				define void @masked_trunc_store_sv4i32_to_sv4i8(<vscale x 4 x i32> %val, i8 *%base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_trunc_store_sv4i32_to_sv4i8:
				; CHECK: st1b { z0.s }, p0, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 4 x i8>*
				%trunc = trunc <vscale x 4 x i32> %val to <vscale x 4 x i8>
				call void @llvm.masked.store.nxv4i8(<vscale x 4 x i8> %trunc, <vscale x 4 x i8> *%base_addr, i32 8, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @masked_trunc_store_sv4i32_to_sv4i16(<vscale x 4 x i32> %val, i16 *%base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_trunc_store_sv4i32_to_sv4i16:
				; CHECK: st1h { z0.s }, p0, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 4 x i16>*
				%trunc = trunc <vscale x 4 x i32> %val to <vscale x 4 x i16>
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %trunc, <vscale x 4 x i16> *%base_addr, i32 8, <vscale x 4 x i1> %mask)
				ret void
				}

				; 8-lane contiguous load/stores.

				define void @test_masked_ldst_sv8i8(i8 * %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv8i8:
				; CHECK: ld1sb { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1]
				; CHECK: st1b { z[[DATA]].h }, p0, [x0, x1]
				; CHECK: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 8 x i8>*
				%data = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>* %base_addr, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i8> undef)
				call void @llvm.masked.store.nxv8i8(<vscale x 8 x i8> %data, <vscale x 8 x i8>* %base_addr, i32 1, <vscale x 8 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv8i16(i16 * %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv8i16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
				; CHECK: st1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 8 x i16>*
				%data = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>* %base_addr, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
				call void @llvm.masked.store.nxv8i16(<vscale x 8 x i16> %data, <vscale x 8 x i16>* %base_addr, i32 1, <vscale x 8 x i1> %mask)
				ret void
				}

				define void @test_masked_ldst_sv8f16(half * %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv8f16:
				; CHECK: ld1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
				; CHECK: st1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_f16 = getelementptr half, half* %base, i64 %offset
				%base_addr = bitcast half* %base_f16 to <vscale x 8 x half>*
				%data = call <vscale x 8 x half> @llvm.masked.load.nxv8f16(<vscale x 8 x half>* %base_addr, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x half> undef)
				call void @llvm.masked.store.nxv8f16(<vscale x 8 x half> %data, <vscale x 8 x half>* %base_addr, i32 1, <vscale x 8 x i1> %mask)
				ret void
				}

				; 8-lane zero/sign extended contiguous loads.

				define <vscale x 8 x i16> @masked_zload_sv8i8_to_sv8i16(i8* %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_zload_sv8i8_to_sv8i16:
				; CHECK: ld1b { z0.h }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 8 x i8>*
				%load = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>* %base_addr, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i8> undef)
				%ext = zext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %ext
				}

				define <vscale x 8 x i16> @masked_sload_sv8i8_to_sv8i16(i8* %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_sload_sv8i8_to_sv8i16:
				; CHECK: ld1sb { z0.h }, p0/z, [x0, x1]
				; CHECK-NEXT: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 8 x i8>*
				%load = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>* %base_addr, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x i8> undef)
				%ext = sext <vscale x 8 x i8> %load to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %ext
				}

				; 8-lane truncating contiguous stores.

				define void @masked_trunc_store_sv8i16_to_sv8i8(<vscale x 8 x i16> %val, i8 *%base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: masked_trunc_store_sv8i16_to_sv8i8:
				; C HECK: st1b { z0.h }, p0, [x0, x1]
				; C HECK-NEXT: ret
				andwarUnsubmitted Done Reply Inline Actions FIXME andwar: FIXME
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 8 x i8>*
				%trunc = trunc <vscale x 8 x i16> %val to <vscale x 8 x i8>
				call void @llvm.masked.store.nxv8i8(<vscale x 8 x i8> %trunc, <vscale x 8 x i8> *%base_addr, i32 8, <vscale x 8 x i1> %mask)
				ret void
				}

				; 16-lane contiguous load/stores.

				define void @test_masked_ldst_sv16i8(i8 * %base, <vscale x 16 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv16i8:
				; CHECK: ld1b { z[[DATA:[0-9]+]].b }, p0/z, [x0, x1]
				; CHECK: st1b { z[[DATA]].b }, p0, [x0, x1]
				; CHECK: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 16 x i8>*
				%data = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8(<vscale x 16 x i8>* %base_addr, i32 1, <vscale x 16 x i1> %mask, <vscale x 16 x i8> undef)
				call void @llvm.masked.store.nxv16i8(<vscale x 16 x i8> %data, <vscale x 16 x i8>* %base_addr, i32 1, <vscale x 16 x i1> %mask)
				ret void
				}

				; 2-element contiguous loads.
				declare <vscale x 2 x i8> @llvm.masked.load.nxv2i8 (<vscale x 2 x i8>* , i32, <vscale x 2 x i1>, <vscale x 2 x i8> )
				declare <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>*, i32, <vscale x 2 x i1>, <vscale x 2 x i16>)
				declare <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>*, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)
				declare <vscale x 2 x i64> @llvm.masked.load.nxv2i64(<vscale x 2 x i64>*, i32, <vscale x 2 x i1>, <vscale x 2 x i64>)
				declare <vscale x 2 x half> @llvm.masked.load.nxv2f16(<vscale x 2 x half>*, i32, <vscale x 2 x i1>, <vscale x 2 x half>)
				declare <vscale x 2 x float> @llvm.masked.load.nxv2f32(<vscale x 2 x float>*, i32, <vscale x 2 x i1>, <vscale x 2 x float>)
				declare <vscale x 2 x double> @llvm.masked.load.nxv2f64(<vscale x 2 x double>*, i32, <vscale x 2 x i1>, <vscale x 2 x double>)

				; 4-element contiguous loads.
				declare <vscale x 4 x i8> @llvm.masked.load.nxv4i8 (<vscale x 4 x i8>* , i32, <vscale x 4 x i1>, <vscale x 4 x i8> )
				declare <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>*, i32, <vscale x 4 x i1>, <vscale x 4 x i16>)
				declare <vscale x 4 x i32> @llvm.masked.load.nxv4i32(<vscale x 4 x i32>*, i32, <vscale x 4 x i1>, <vscale x 4 x i32>)
				declare <vscale x 4 x half> @llvm.masked.load.nxv4f16(<vscale x 4 x half>*, i32, <vscale x 4 x i1>, <vscale x 4 x half>)
				declare <vscale x 4 x float> @llvm.masked.load.nxv4f32(<vscale x 4 x float>*, i32, <vscale x 4 x i1>, <vscale x 4 x float>)

				; 8-element contiguous loads.
				declare <vscale x 8 x i8> @llvm.masked.load.nxv8i8 (<vscale x 8 x i8>* , i32, <vscale x 8 x i1>, <vscale x 8 x i8> )
				declare <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>*, i32, <vscale x 8 x i1>, <vscale x 8 x i16>)
				declare <vscale x 8 x half> @llvm.masked.load.nxv8f16(<vscale x 8 x half>*, i32, <vscale x 8 x i1>, <vscale x 8 x half>)

				; 16-element contiguous loads.
				declare <vscale x 16 x i8> @llvm.masked.load.nxv16i8(<vscale x 16 x i8>*, i32, <vscale x 16 x i1>, <vscale x 16 x i8>)

				; 2-element contiguous stores.
				declare void @llvm.masked.store.nxv2i8 (<vscale x 2 x i8> , <vscale x 2 x i8>* , i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2i16(<vscale x 2 x i16>, <vscale x 2 x i16>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2f16(<vscale x 2 x half>, <vscale x 2 x half>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float>*, i32, <vscale x 2 x i1>)
				declare void @llvm.masked.store.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>*, i32, <vscale x 2 x i1>)

				; 4-element contiguous stores.
				declare void @llvm.masked.store.nxv4i8 (<vscale x 4 x i8> , <vscale x 4 x i8>* , i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4f16(<vscale x 4 x half>, <vscale x 4 x half>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>*, i32, <vscale x 4 x i1>)

				; 8-element contiguous stores.
				declare void @llvm.masked.store.nxv8i8 (<vscale x 8 x i8> , <vscale x 8 x i8>* , i32, <vscale x 8 x i1>)
				declare void @llvm.masked.store.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>*, i32, <vscale x 8 x i1>)
				declare void @llvm.masked.store.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>*, i32, <vscale x 8 x i1>)

				; 16-element contiguous stores.
				declare void @llvm.masked.store.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>*, i32, <vscale x 16 x i1>)

llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s

				; Range checks: for all the instruction tested in this file, the
				; immediate must be within the range [-8, 7] (4-bit immediate). Out of
				; range values are tested only in one case (following). Valid values
				; are tested all through the rest of the file.

				define void @imm_out_of_range(<vscale x 2 x i64> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: imm_out_of_range:
				; CHECK: ldnt1d { z[[DATA:[0-9]+]].d }, p0/z, [x{{[0-9]+}}]
				; CHECK: stnt1d { z[[DATA]].d }, p0, [x{{[0-9]+}}]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, i64 8
				%data = call <vscale x 2 x i64> @llvm.aarch64.sve.ldnt1.nxv2i64(<vscale x 2 x i1> %mask, <vscale x 2 x i64>* %base_load)
				%base_store = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64> * %base, i64 -9
				call void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64> %data, <vscale x 2 x i1> %mask, <vscale x 2 x i64>* %base_store)
				ret void
				}

				; 2-lane non-temporal load/stores


				define void @test_masked_ldst_sv2i64(<vscale x 2 x i64> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2i64:
				; CHECK: ldnt1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-8, mul vl]
				; CHECK: stnt1d { z[[DATA]].d }, p0, [x0, #-7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, i64 -8
				%data = call <vscale x 2 x i64> @llvm.aarch64.sve.ldnt1.nxv2i64(<vscale x 2 x i1> %mask, <vscale x 2 x i64>* %base_load)
				%base_store = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64> * %base, i64 -7
				call void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64> %data, <vscale x 2 x i1> %mask, <vscale x 2 x i64>* %base_store)
				ret void
				}

				define void @test_masked_ldst_sv2f64(<vscale x 2 x double> * %base, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv2f64:
				; CHECK: ldnt1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, #-6, mul vl]
				; CHECK: stnt1d { z[[DATA]].d }, p0, [x0, #-5, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %base, i64 -6
				%data = call <vscale x 2 x double> @llvm.aarch64.sve.ldnt1.nxv2f64(<vscale x 2 x i1> %mask,<vscale x 2 x double>* %base_load)
				%base_store = getelementptr <vscale x 2 x double>, <vscale x 2 x double> * %base, i64 -5
				call void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x i1> %mask, <vscale x 2 x double>* %base_store)
				ret void
				}

				; 4-lane non-temporal load/stores.

				define void @test_masked_ldst_sv4i32(<vscale x 4 x i32> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4i32:
				; CHECK: ldnt1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, #6, mul vl]
				; CHECK: stnt1w { z[[DATA]].s }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %base, i64 6
				%data = call <vscale x 4 x i32> @llvm.aarch64.sve.ldnt1.nxv4i32(<vscale x 4 x i1> %mask, <vscale x 4 x i32>* %base_load)
				%base_store = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> * %base, i64 7
				call void @llvm.aarch64.sve.stnt1.nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i1> %mask, <vscale x 4 x i32>* %base_store)
				ret void
				}

				define void @test_masked_ldst_sv4f32(<vscale x 4 x float> * %base, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv4f32:
				; CHECK: ldnt1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, #-1, mul vl]
				; CHECK: stnt1w { z[[DATA]].s }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 4 x float>, <vscale x 4 x float>* %base, i64 -1
				%data = call <vscale x 4 x float> @llvm.aarch64.sve.ldnt1.nxv4f32(<vscale x 4 x i1> %mask, <vscale x 4 x float>* %base_load)
				%base_store = getelementptr <vscale x 4 x float>, <vscale x 4 x float> * %base, i64 2
				call void @llvm.aarch64.sve.stnt1.nxv4f32(<vscale x 4 x float> %data, <vscale x 4 x i1> %mask, <vscale x 4 x float>* %base_store)
				ret void
				}


				; 8-lane non-temporal load/stores.

				define void @test_masked_ldst_sv8i16(<vscale x 8 x i16> * %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv8i16:
				; CHECK: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #6, mul vl]
				; CHECK: stnt1h { z[[DATA]].h }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16>* %base, i64 6
				%data = call <vscale x 8 x i16> @llvm.aarch64.sve.ldnt1.nxv8i16(<vscale x 8 x i1> %mask, <vscale x 8 x i16>* %base_load)
				%base_store = getelementptr <vscale x 8 x i16>, <vscale x 8 x i16> * %base, i64 7
				call void @llvm.aarch64.sve.stnt1.nxv8i16(<vscale x 8 x i16> %data, <vscale x 8 x i1> %mask, <vscale x 8 x i16>* %base_store)
				ret void
				}

				define void @test_masked_ldst_sv8f16(<vscale x 8 x half> * %base, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv8f16:
				; CHECK: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, #-1, mul vl]
				; CHECK: stnt1h { z[[DATA]].h }, p0, [x0, #2, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 8 x half>, <vscale x 8 x half>* %base, i64 -1
				%data = call <vscale x 8 x half> @llvm.aarch64.sve.ldnt1.nxv8f16(<vscale x 8 x i1> %mask, <vscale x 8 x half>* %base_load)
				%base_store = getelementptr <vscale x 8 x half>, <vscale x 8 x half> * %base, i64 2
				call void @llvm.aarch64.sve.stnt1.nxv8f16(<vscale x 8 x half> %data, <vscale x 8 x i1> %mask, <vscale x 8 x half>* %base_store)
				ret void
				}

				; 16-lane non-temporal load/stores.

				define void @test_masked_ldst_sv16i8(<vscale x 16 x i8> * %base, <vscale x 16 x i1> %mask) {
				; CHECK-LABEL: test_masked_ldst_sv16i8:
				; CHECK: ldnt1b { z[[DATA:[0-9]+]].b }, p0/z, [x0, #6, mul vl]
				; CHECK: stnt1b { z[[DATA]].b }, p0, [x0, #7, mul vl]
				; CHECK: ret
				%base_load = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %base, i64 6
				%data = call <vscale x 16 x i8> @llvm.aarch64.sve.ldnt1.nxv16i8(<vscale x 16 x i1> %mask, <vscale x 16 x i8>* %base_load)
				%base_store = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8> * %base, i64 7
				call void @llvm.aarch64.sve.stnt1.nxv16i8(<vscale x 16 x i8> %data, <vscale x 16 x i1> %mask, <vscale x 16 x i8>* %base_store)
				ret void
				}

				; 2-element non-temporal loads.
				declare <vscale x 2 x i64> @llvm.aarch64.sve.ldnt1.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>*)
				declare <vscale x 2 x double> @llvm.aarch64.sve.ldnt1.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>*)

				; 4-element non-temporal loads.
				declare <vscale x 4 x i32> @llvm.aarch64.sve.ldnt1.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>*)
				declare <vscale x 4 x float> @llvm.aarch64.sve.ldnt1.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>*)

				; 8-element non-temporal loads.
				declare <vscale x 8 x i16> @llvm.aarch64.sve.ldnt1.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>*)
				declare <vscale x 8 x half> @llvm.aarch64.sve.ldnt1.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>*)

				; 16-element non-temporal loads.
				declare <vscale x 16 x i8> @llvm.aarch64.sve.ldnt1.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>*)

				; 2-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x i64>*)
				declare void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double>, <vscale x 2 x i1>, <vscale x 2 x double>*)

				; 4-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i1>, <vscale x 4 x i32>*)
				declare void @llvm.aarch64.sve.stnt1.nxv4f32(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 4 x float>*)

				; 8-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i1>, <vscale x 8 x i16>*)
				declare void @llvm.aarch64.sve.stnt1.nxv8f16(<vscale x 8 x half>, <vscale x 8 x i1>, <vscale x 8 x half>*)

				; 16-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i1>, <vscale x 16 x i8>*)

llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s

				; 2-lane non-temporal load/stores

				define void @test_masked_ldst_sv2i64(i64* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2i64:
				; CHECK: ldnt1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1, lsl #3]
				; CHECK: stnt1d { z[[DATA]].d }, p0, [x0, x1, lsl #3]
				; CHECK: ret
				%base_i64 = getelementptr i64, i64* %base, i64 %offset
				%base_addr = bitcast i64* %base_i64 to <vscale x 2 x i64>*
				%data = call <vscale x 2 x i64> @llvm.aarch64.sve.ldnt1.nxv2i64(<vscale x 2 x i1> %mask, <vscale x 2 x i64>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64> %data, <vscale x 2 x i1> %mask, <vscale x 2 x i64>* %base_addr)
				ret void
				}

				define void @test_masked_ldst_sv2f64(double* %base, <vscale x 2 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv2f64:
				; CHECK: ldnt1d { z[[DATA:[0-9]+]].d }, p0/z, [x0, x1, lsl #3]
				; CHECK: stnt1d { z[[DATA]].d }, p0, [x0, x1, lsl #3]
				; CHECK: ret
				%base_double = getelementptr double, double* %base, i64 %offset
				%base_addr = bitcast double* %base_double to <vscale x 2 x double>*
				%data = call <vscale x 2 x double> @llvm.aarch64.sve.ldnt1.nxv2f64(<vscale x 2 x i1> %mask,<vscale x 2 x double>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x i1> %mask, <vscale x 2 x double>* %base_addr)
				ret void
				}

				; 4-lane non-temporal load/stores.

				define void @test_masked_ldst_sv4i32(i32* %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4i32:
				; CHECK: ldnt1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1, lsl #2]
				; CHECK: stnt1w { z[[DATA]].s }, p0, [x0, x1, lsl #2]
				; CHECK: ret
				%base_i32 = getelementptr i32, i32* %base, i64 %offset
				%base_addr = bitcast i32* %base_i32 to <vscale x 4 x i32>*
				%data = call <vscale x 4 x i32> @llvm.aarch64.sve.ldnt1.nxv4i32(<vscale x 4 x i1> %mask, <vscale x 4 x i32>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i1> %mask, <vscale x 4 x i32>* %base_addr)
				ret void
				}

				define void @test_masked_ldst_sv4f32(float* %base, <vscale x 4 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv4f32:
				; CHECK: ldnt1w { z[[DATA:[0-9]+]].s }, p0/z, [x0, x1, lsl #2]
				; CHECK: stnt1w { z[[DATA]].s }, p0, [x0, x1, lsl #2]
				; CHECK: ret
				%base_float = getelementptr float, float* %base, i64 %offset
				%base_addr = bitcast float* %base_float to <vscale x 4 x float>*
				%data = call <vscale x 4 x float> @llvm.aarch64.sve.ldnt1.nxv4f32(<vscale x 4 x i1> %mask, <vscale x 4 x float>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv4f32(<vscale x 4 x float> %data, <vscale x 4 x i1> %mask, <vscale x 4 x float>* %base_addr)
				ret void
				}


				; 8-lane non-temporal load/stores.

				define void @test_masked_ldst_sv8i16(i16* %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv8i16:
				; CHECK: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
				; CHECK: stnt1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_i16 = getelementptr i16, i16* %base, i64 %offset
				%base_addr = bitcast i16* %base_i16 to <vscale x 8 x i16>*
				%data = call <vscale x 8 x i16> @llvm.aarch64.sve.ldnt1.nxv8i16(<vscale x 8 x i1> %mask, <vscale x 8 x i16>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv8i16(<vscale x 8 x i16> %data, <vscale x 8 x i1> %mask, <vscale x 8 x i16>* %base_addr)
				ret void
				}

				define void @test_masked_ldst_sv8f16(half* %base, <vscale x 8 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv8f16:
				; CHECK: ldnt1h { z[[DATA:[0-9]+]].h }, p0/z, [x0, x1, lsl #1]
				; CHECK: stnt1h { z[[DATA]].h }, p0, [x0, x1, lsl #1]
				; CHECK: ret
				%base_half = getelementptr half, half* %base, i64 %offset
				%base_addr = bitcast half* %base_half to <vscale x 8 x half>*
				%data = call <vscale x 8 x half> @llvm.aarch64.sve.ldnt1.nxv8f16(<vscale x 8 x i1> %mask, <vscale x 8 x half>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv8f16(<vscale x 8 x half> %data, <vscale x 8 x i1> %mask, <vscale x 8 x half>* %base_addr)
				ret void
				}

				; 16-lane non-temporal load/stores.

				define void @test_masked_ldst_sv16i8(i8* %base, <vscale x 16 x i1> %mask, i64 %offset) {
				; CHECK-LABEL: test_masked_ldst_sv16i8:
				; CHECK: ldnt1b { z[[DATA:[0-9]+]].b }, p0/z, [x0, x1]
				; CHECK: stnt1b { z[[DATA]].b }, p0, [x0, x1]
				; CHECK: ret
				%base_i8 = getelementptr i8, i8* %base, i64 %offset
				%base_addr = bitcast i8* %base_i8 to <vscale x 16 x i8>*
				%data = call <vscale x 16 x i8> @llvm.aarch64.sve.ldnt1.nxv16i8(<vscale x 16 x i1> %mask, <vscale x 16 x i8>* %base_addr)
				call void @llvm.aarch64.sve.stnt1.nxv16i8(<vscale x 16 x i8> %data, <vscale x 16 x i1> %mask, <vscale x 16 x i8>* %base_addr)
				ret void
				}

				; 2-element non-temporal loads.
				declare <vscale x 2 x i64> @llvm.aarch64.sve.ldnt1.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>*)
				declare <vscale x 2 x double> @llvm.aarch64.sve.ldnt1.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>*)

				; 4-element non-temporal loads.
				declare <vscale x 4 x i32> @llvm.aarch64.sve.ldnt1.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>*)
				declare <vscale x 4 x float> @llvm.aarch64.sve.ldnt1.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>*)

				; 8-element non-temporal loads.
				declare <vscale x 8 x i16> @llvm.aarch64.sve.ldnt1.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>*)
				declare <vscale x 8 x half> @llvm.aarch64.sve.ldnt1.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>*)

				; 16-element non-temporal loads.
				declare <vscale x 16 x i8> @llvm.aarch64.sve.ldnt1.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>*)

				; 2-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, <vscale x 2 x i64>*)
				declare void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double>, <vscale x 2 x i1>, <vscale x 2 x double>*)

				; 4-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i1>, <vscale x 4 x i32>*)
				declare void @llvm.aarch64.sve.stnt1.nxv4f32(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 4 x float>*)

				; 8-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i1>, <vscale x 8 x i16>*)
				declare void @llvm.aarch64.sve.stnt1.nxv8f16(<vscale x 8 x half>, <vscale x 8 x i1>, <vscale x 8 x half>*)

				; 16-element non-temporal stores.
				declare void @llvm.aarch64.sve.stnt1.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i1>, <vscale x 16 x i8>*)

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve --asm-verbose=false < %s \|FileCheck %s

				declare i32 @llvm.vscale.i32()
				declare i64 @llvm.vscale.i64()

				; Fold (add (vscale * C0), (vscale C1)) to (vscale C0 + C1))
				andwarUnsubmitted Done Reply Inline Actions `vscale * C1`? and `vscale * (C0 + C1)`? andwar: `vscale * C1`? and `vscale * (C0 + C1)`?
				andwarUnsubmitted Done Reply Inline Actions What about `vscale * C1`? andwar: What about `vscale * C1`?
				define i64 @combine_add_vscale_i64() nounwind {
				; CHECK-LABEL: combine_add_vscale_i64:
				; CHECK-NEXT: cntd x0
				andwarUnsubmitted Done Reply Inline Actions Perhaps `; CHECK-NOT: add`? andwar: Perhaps `; CHECK-NOT: add`?
				; CHECK-NEXT: ret
				%vscale = call i64 @llvm.vscale.i64()
				%add = add i64 %vscale, %vscale
				ret i64 %add
				}
				andwarUnsubmitted Done Reply Inline Actions Could `C0` and `C1` be different than 1? andwar: Could `C0` and `C1` be different than 1?
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Yes, but to get C0 and C1 different from 1 I'd have to use a `mul` instruction, which is already tested in the `combine_mul_vscale_` tests. I think it is enough to test with C0=C1=1. fpetrogalli:* Yes, but to get C0 and C1 different from 1 I'd have to use a `mul` instruction, which is…

				define i32 @combine_add_vscale_i32() nounwind {
				; CHECK-LABEL: combine_add_vscale_i32:
				; CHECK-NEXT: cntd x0
				; CHECK-NEXT: ret
				%vscale = call i32 @llvm.vscale.i32()
				%add = add i32 %vscale, %vscale
				ret i32 %add
				}

				; Fold (mul (vscale * C0), C1) to (vscale C0 * C1))
				andwarUnsubmitted Done Reply Inline Actions `vscale * (C0 * C1)`? What's `C0` and what is `C1` in this example? andwar: * `vscale * (C0 * C1)`? * What's `C0` and what is `C1` in this example?
				define i64 @combine_mul_vscale_i64() nounwind {
				; CHECK-LABEL: combine_mul_vscale_i64:
				; CHECK-NEXT: rdvl x8, #1
				; CHECK-NEXT: lsr x8, x8, #4
				andwarUnsubmitted Done Reply Inline Actions Did you mean `C0 = 2`? And multiplication by `C0` seems to be missing - there's only multiplication by `32`. Shouldn't the IR look like this: %vscale = call i64 @llvm.vscale.i64() %mul_by_16 = mul i64 %vscale, 16 %mul_by_2 = mul i64 %mul_by_16, 32 ret i64 %mul_by_2 andwar: Did you mean `C0 = 2`? And multiplication by `C0` seems to be missing - there's only…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions When targeting SVE, `@llvm.vscale.()` returns the number of 16-byte chunks of and SVE register. If I multiply it by 32, it means it is returning the number of 4-bit (half-byte) elements in the SVE register. `RDVL` returns the number of 8-bit lanes in the register, hence the number of 4-bit lanes is given by `RDVL Xn, %2`. I have updated the comment setting C1 = 32. fpetrogalli:* When targeting SVE, `@llvm.vscale.*()` returns the number of 16-byte chunks of and SVE register.
				; CHECK-NEXT: mov w9, #3
				; CHECK-NEXT: mul x0, x8, x9
				; CHECK-NEXT: ret
				%vscale = call i64 @llvm.vscale.i64()
				andwarUnsubmitted Done Reply Inline Actions [nit] Align with the following line (missing space) andwar: [nit] Align with the following line (missing space)
				%mul = mul i64 %vscale, 3
				ret i64 %mul
				}

				define i32 @combine_mul_vscale_i32() nounwind {
				andwarUnsubmitted Done Reply Inline Actions Is `nounwind` needed in these examples? andwar: Is `nounwind` needed in these examples?
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Yes, to be able to use CHECK-NEXT after CHECK-LABEL fpetrogalli: Yes, to be able to use CHECK-NEXT after CHECK-LABEL
				; CHECK-LABEL: combine_mul_vscale_i32:
				; CHECK-NEXT: rdvl x8, #1
				; CHECK-NEXT: lsr x8, x8, #4
				; CHECK-NEXT: mov w9, #3
				; CHECK-NEXT: mul x0, x8, x9
				; CHECK-NEXT: ret
				%vscale = call i32 @llvm.vscale.i32()
				%mul = mul i32 %vscale, 3
				ret i32 %mul
				}

				; Canonicalize (sub X, (vscale * C)) to (add X, (vscale * -C))
				define i64 @combine_sub_vscale_i64(i64 %in) nounwind {
				; CHECK-LABEL: combine_sub_vscale_i64:
				; CHECK-NEXT: rdvl x8, #-1
				; CHECK-NEXT: asr x8, x8, #4
				; CHECK-NEXT: add x0, x0, x8
				andwarUnsubmitted Done Reply Inline Actions Perhaps `; CHECK-NOT: sub`? andwar: Perhaps `; CHECK-NOT: sub`?
				; CHECK-NEXT: ret
				%vscale = call i64 @llvm.vscale.i64()
				%sub = sub i64 %in, %vscale
				ret i64 %sub
				}

				define i32 @combine_sub_vscale_i32(i32 %in) nounwind {
				; CHECK-LABEL: combine_sub_vscale_i32:
				; CHECK-NEXT: rdvl x8, #-1
				; CHECK-NEXT: asr x8, x8, #4
				; CHECK-NEXT: add w0, w0, w8
				; CHECK-NEXT: ret
				%vscale = call i32 @llvm.vscale.i32()
				%sub = sub i32 %in, %vscale
				ret i32 %sub
				}


				; Fold (shl (vscale * C0), C1) to (vscale C0 << C1))
				define i64 @combine_shl_vscale_i64() nounwind {
				; CHECK-LABEL: combine_shl_vscale_i64:
				; CHECK-NEXT: rdvl x0, #4
				andwarUnsubmitted Done Reply Inline Actions `; CHECK-NOT: shl`? andwar: `; CHECK-NOT: shl`?
				; CHECK-NEXT: ret
				%vscale = call i64 @llvm.vscale.i64()
				%shl = shl i64 %vscale, 6
				andwarUnsubmitted Done Reply Inline Actions Hm, since it's `6` here, shouldn't line 77 be `; CHECK-NEXT: rdvl x0, #6`? andwar: Hm, since it's `6` here, shouldn't line 77 be `; CHECK-NEXT: rdvl x0, #6`?
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions At IR level, it is %shl = 2^6 * VSCALE At Assembly level, the output of RDVL is 2^4 * VSCALE Hence, to compute %shl we need to multiply RDVL output by 2^2 -> #4 is correct. Does that make sense? I have actually added it as a comment, but I have modified the code so that it produces rdvl #1. fpetrogalli: At IR level, it is %shl = 2^6 * VSCALE At Assembly level, the output of RDVL is 2^4 * VSCALE…
				ret i64 %shl
				}

				define i32 @combine_shl_vscale_i32() nounwind {
				; CHECK-LABEL: combine_shl_vscale_i32:
				; CHECK-NEXT: rdvl x0, #4
				; CHECK-NEXT: ret
				%vscale = call i32 @llvm.vscale.i32()
				%shl = shl i32 %vscale, 6
				ret i32 %shl
				}

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][aarch64] SVE addressing modes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 244273

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve-gep.ll

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-imm.ll

llvm/test/CodeGen/AArch64/sve-pred-contiguous-ldst-addressing-mode-reg-reg.ll

llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-imm.ll

llvm/test/CodeGen/AArch64/sve-pred-non-temporal-ldst-addressing-mode-reg-reg.ll

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll

[llvm][aarch64] SVE addressing modes.
ClosedPublic