This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/2
AArch64InstrInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/3
sve-fixed-ld2-alloca.ll
1/1
sve-ldN.mir
-
sve-stN.mir

Differential D119338

[AArch64][SVE] Add structured load/store opcodes to getMemOpInfo
ClosedPublic

Authored by kmclaughlin on Feb 9 2022, 7:01 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
peterwaller-arm
efriedma

Commits

rGfc1b21228e39: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Summary

Currently, loading from or storing to a stack location with a structured load
or store crashes in isAArch64FrameOffsetLegal as the opcodes are not handled by
getMemOpInfo. This patch adds the opcodes for structured load/store instructions
with an immediate index to getMemOpInfo & getLoadStoreImmIdx, setting appropriate
values for the scale, width & min/max offsets.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Feb 9 2022, 7:01 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald TranscriptFeb 9 2022, 7:01 AM

kmclaughlin requested review of this revision.Feb 9 2022, 7:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2022, 7:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Hi @kmclaughlin, this looks like a nice fix! I've got a few comments so far about the load tests. Some of the comments about choosing the right size for the alloca probably apply to the store tests too.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2935	nit: I think you can just write TypeSize::Scalable(32) and similarly for the others.
llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
11 ↗	(On Diff #407150)	I'm not sure if this is really exercising the code path you've changed because the stack pointer looks wrong here, i.e. we've not actually allocated anything and are just loading random data. I think it's because nothing got stored into the alloca and so it was optimised away. I wonder if it's worth adding some kind of normal store before the load, i.e. something like store [16 x i8] zeroinitializer, [16 x i8]* %alloc, align 16
28 ↗	(On Diff #407150)	I think that all the allocas here have to at least match the size of the data you're storing out. The test above has the same problem too I think. I think for all these tests you can just do: %alloc = alloca [NumVecs * 32 x i8], i32 1, align 16 I choose '32' here because you've set vscale_range to (2,2). For ld2 NumVecs=2, ld3 NumVecs=3, etc.
72 ↗	(On Diff #407150)	Again, here you're indexing into some bigger than the alloca. You need to allocate at least 3 x the space required for <vscale x 32 x i8> here.
170 ↗	(On Diff #407150)	I think this is a valid test, but perhaps worth adding a comment why it's useful? It's because this exercises the path where isAArch64FrameOffsetLegal returns a non-zero stack offset due to extra alloca.

sdesmalen added inline comments.Feb 9 2022, 8:14 AM

llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
6 ↗	(On Diff #407150)	Tests that rely on `vscale_range` become more readable if they'd have the `vscale_range` specified directly with the definition. If the test file is long (like this one), the #<some number> make it really obscure what vscale_range it refers to. I fixed something similar in 11cea7e5ce4d3f6a0d2fac016d503f99c52cdc96

Harbormaster completed remote builds in B148480: Diff 407150.Feb 9 2022, 8:42 AM

Ensure the correct amount of space is allocated in each test by increasing the size of the allocas.
Added a store to each of the tests to ensure the allocas aren't optimised away.
Moved vscale_range into the definitions of the tests.

This is looking good now @kmclaughlin! I just had a few more minor comments on the tests ...

llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
79 ↗	(On Diff #407518)	Hi @kmclaughlin, I think this needs to be: %alloc = alloca [64 x i8], i32 3 because of the GEP.
183 ↗	(On Diff #407518)	I think this also needs to be: %alloc = alloca [48 x i16], i32 4
204 ↗	(On Diff #407518)	This needs to be: %alloca1 = alloca <vscale x 4 x float>, i32 12 due to the GEP index (9) + the fact you're actually loading <vscale x 12 x float> worth of data.
313 ↗	(On Diff #407518)	This needs to be %alloca1 = alloca <vscale x 2 x double>, i32 13 due to the GEP offset (9) + the data size loaded.
llvm/test/CodeGen/AArch64/stN-reg-imm-alloca.ll
18 ↗	(On Diff #407518)	Hi @kmclaughlin, I'm afraid I may have sent you down a pointless path here and that's my fault for not explaining this more clearly! For the structured stores you don't actually need the extra store here because the alloca cannot be folded away, since the structured store already prevented that. If it's not too much trouble, I think it's worth removing these extra stores to make the tests simpler. NOTE: The only exception to this is where you have two allocas in the test, in which case having a store for each alloca is right!
88 ↗	(On Diff #407518)	I think this needs to be: %alloc = alloca [8 x i64], i32 4 to account for the GEP offset + store data.
344 ↗	(On Diff #407518)	I think this needs to be: %alloc = alloca [32 x i32], i32 2
372 ↗	(On Diff #407518)	I think this needs to be: %alloca1 = alloca <vscale x 2 x double>, i32 8

Harbormaster completed remote builds in B148737: Diff 407518.Feb 10 2022, 7:52 AM

Removed unnecessary stores from tests in stN-reg-imm-alloca.ll which use only one alloca.
Increased the number of elements for allocas in a number of tests in which a gep was attempting to access data beyond the allocated space.

LGTM! Thanks for making the changes @kmclaughlin.

llvm/test/CodeGen/AArch64/sve-fixed-ld2-alloca.ll
2	nit: I think maybe we don't need this file anymore, since all the various loads/stores are now covered in the other tests?

This revision is now accepted and ready to land.Feb 10 2022, 8:34 AM

sdesmalen added inline comments.Feb 10 2022, 8:46 AM

llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
14 ↗	(On Diff #407542)	Looking at the code-changes, I actually don't think that the fixed-width property (for the alloca and vscale_range) matters for these tests. I think you only really need the scalable tests.
96 ↗	(On Diff #407542)	Can you make sure that all these tests have a positive test for the maximum range, and a negative test for one-beyond the maximum range ?
200–201 ↗	(On Diff #407542)	nit: I'd recommend adding `nounwind` attribute to avoid the CFI info, as it's not relevant to the test.
317 ↗	(On Diff #407542)	alloca2 is unused? (also true for other cases)
llvm/test/CodeGen/AArch64/sve-fixed-ld2-alloca.ll
2	Perhaps we can keep this one test if the other fixed-width tests are removed, just to ensure we cover that case too (even if it doesn't necessarily exercise anything specific in this patch that isn't already tested otherwise). If so, then I'd recommend expressing it with an explicit st2 intrinsic instead of a shufflevector+store.
3	redundant if vscale_range is set.

david-arm added inline comments.Feb 10 2022, 8:53 AM

llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
317 ↗	(On Diff #407542)	This is something I asked @kmclaughlin to do because it's the only way to expose some of the code changes in this patch. All the tests ending _valid_imm do this for that reason. If you look at `isAArch64FrameOffsetLegal` we return a StackOffset, which is always zero for all tests in this file except ones like this. Having a non-zero StackOffset helped to ensure we were calculating the remainder/offset correctly using the Scale property set in `getMemOpInfo`. We can remove the test, but I'm worried we're not fully testing the changes that's all. For example, in `ld3b_f32_valid_imm` you'll notice the addvl just before the ld3b, which happens precisely because StackOffset is non-zero.

sdesmalen added inline comments.Feb 10 2022, 9:14 AM

llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll

317 ↗

(On Diff #407542)

I assumed that was what the gep was for. Maybe it's because of how this is written. If you write:

%alloca1 = alloca <vscale x 64 x double>, align 4                                              
%alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*               
%base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 28, i64 0 
%ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)

Then that results in:

ld4d    { z0.d, z1.d, z2.d, z3.d }, p0/z, [sp, #28, mul vl]

Whereas

%alloca1 = alloca <vscale x 64 x double>, align 4                                              
%alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*               
%base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 32, i64 0 
%ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)

Results in:

<x8 = calculations for sp + 28 * sizeof(VL)>
ld4d    { z0.d, z1.d, z2.d, z3.d }, p0/z, [x8]

Harbormaster completed remote builds in B148753: Diff 407542.Feb 10 2022, 9:26 AM

david-arm added inline comments.Feb 11 2022, 1:41 AM

llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
317 ↗	(On Diff #407542)	Sure, I'd be happy with that if it works and @kmclaughlin can see it leads to the non-zero StackOffset - if we can avoid the second alloca then all the better!

Changed fixed-width allocas used in the tests to scalable.
Added tests for offsets which are at the min/max range & tests outside the min/max range.
Added the nounwind attribute to all tests.
Changed the tests for non-zero offsets to remove the second alloca.

Harbormaster completed remote builds in B149053: Diff 407979.Feb 11 2022, 1:28 PM

Hi @kmclaughlin thanks for updating the tests to work only on scalable types. I think the tests are still a bit inconsistent, and I've tried to highlight some of the inconsistencies in one of the files. Perhaps iteratively updating this file will just make things more complicated, so maybe it's easier to start with a clean slate and test:

ld2b valid min offset
ld2b valid max offset
ld2b one below min offset
ld2b one beyond max offset

ld2h valid min offset
ld2h valid max offset
ld2h one below min offset
ld2h one beyond max offset

and doing that also for ld3, ld4, st2, st3 and st4.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2926	nit: remove redundant newlines here and below.
llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll
1 ↗	(On Diff #407979)	These files are all specific to SVE, can you prefix them with `sve-`?
6 ↗	(On Diff #407979)	vscale range can now be removed.
18 ↗	(On Diff #407979)	What values do these stores add to these tests?
23–38 ↗	(On Diff #407979)	personally I don't see a lot of value in tests where it's indexing into the alloca at offset 0.
75 ↗	(On Diff #407979)	`ld2b` for loading f16's? (i see the same in other tests)
325 ↗	(On Diff #407979)	I'm a bit confused why you're mixing tests for floating point and integer element types. We can just use only integer types, because loads/stores don't distinguish the types. It is not relevant for these tests since they're trying to test getMemOpInfo (as opposed to say, ISEL).

After some discussion about the tests in this patch offline, I have removed ldN-reg-imm-alloca.ll & stN-reg-imm-alloca.ll in favour of adding mir tests.
Removed newlines introduced in AArch64InstrInfo.cpp.

Harbormaster completed remote builds in B149389: Diff 408414.Feb 14 2022, 9:06 AM

Moved all structured load tests into sve-ldN.mir & all store tests to sve-stN.mir

Harbormaster completed remote builds in B149404: Diff 408443.Feb 14 2022, 9:56 AM

mnadeem added a subscriber: mnadeem.Feb 14 2022, 10:58 AM

Hi @kmclaughlin, thanks for updating the tests as MIR tests, these look really good now! I've still requested changes on the patch, because the scaling doesn't look right yet. It seems like this offsets the base pointer with the wrong value/offset. It should be pretty trivial to fix that though.

llvm/test/CodeGen/AArch64/sve-ldN.mir
95–96	I don't think this is correct, as I expect this to be a total offset of ptr + 2 + 14. i.e. an ld2w instruction loads 2 vectors worth of data. The immediate is a multiple of 2, so the maximum immediate value to load from is ptr + 14 * sizeof(vector). When we load 2 * sizeof(vector) beyond that, I would expect it to load from ptr + 16, not ptr + 18. It depends on where the scaling is done. In this case, I think the scaling for this operand should be `TypeSize::Scalable(16)`, because the immediate is already expected to be scaled (it is already a multiple of 2/3/4). The encoding of the immediate will then divide the offset by 2, 3 or 4. If we would have implemented the operand as an operand that is not yet scaled, then we'd need to do the scaling here, as well as in the operand printer.

This revision now requires changes to proceed.Feb 15 2022, 1:58 AM

Changed Scale to TypeSize::Scalable(16) for all opcodes added to getMemOpInfo, fixing incorrect scaling when the immediate is out of range

Harbormaster completed remote builds in B149739: Diff 408925.Feb 15 2022, 11:01 AM

Reverted the previous changes which set Scale to TypeSize::Scalable(16) for all opcodes.
Corrected the Min & Max values added to getMemOpInfo, as these should be the indices -8 to 7 for all structured loads & stores.

Harbormaster completed remote builds in B150219: Diff 409623.Feb 17 2022, 6:46 AM

LGTM!

This revision is now accepted and ready to land.Feb 17 2022, 7:27 AM

This revision was landed with ongoing or failed builds.Feb 17 2022, 9:09 AM

Closed by commit rGfc1b21228e39: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rGfc1b21228e39: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

65 lines

test/

CodeGen/

AArch64/

sve-fixed-ld2-alloca.ll

27 lines

sve-ldN.mir

261 lines

sve-stN.mir

261 lines

Diff 409685

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,264 Lines • ▼ Show 20 Lines	unsigned AArch64InstrInfo::getLoadStoreImmIdx(unsigned Opc) {
case AArch64::LD1H_D_IMM:		case AArch64::LD1H_D_IMM:
case AArch64::LD1SH_S_IMM:		case AArch64::LD1SH_S_IMM:
case AArch64::LD1SH_D_IMM:		case AArch64::LD1SH_D_IMM:
case AArch64::LD1W_IMM:		case AArch64::LD1W_IMM:
case AArch64::LD1W_D_IMM:		case AArch64::LD1W_D_IMM:
case AArch64::LD1SW_D_IMM:		case AArch64::LD1SW_D_IMM:
case AArch64::LD1D_IMM:		case AArch64::LD1D_IMM:

		case AArch64::LD2B_IMM:
		case AArch64::LD2H_IMM:
		case AArch64::LD2W_IMM:
		case AArch64::LD2D_IMM:
		case AArch64::LD3B_IMM:
		case AArch64::LD3H_IMM:
		case AArch64::LD3W_IMM:
		case AArch64::LD3D_IMM:
		case AArch64::LD4B_IMM:
		case AArch64::LD4H_IMM:
		case AArch64::LD4W_IMM:
		case AArch64::LD4D_IMM:

case AArch64::ST1B_IMM:		case AArch64::ST1B_IMM:
case AArch64::ST1B_H_IMM:		case AArch64::ST1B_H_IMM:
case AArch64::ST1B_S_IMM:		case AArch64::ST1B_S_IMM:
case AArch64::ST1B_D_IMM:		case AArch64::ST1B_D_IMM:
case AArch64::ST1H_IMM:		case AArch64::ST1H_IMM:
case AArch64::ST1H_S_IMM:		case AArch64::ST1H_S_IMM:
case AArch64::ST1H_D_IMM:		case AArch64::ST1H_D_IMM:
case AArch64::ST1W_IMM:		case AArch64::ST1W_IMM:
case AArch64::ST1W_D_IMM:		case AArch64::ST1W_D_IMM:
case AArch64::ST1D_IMM:		case AArch64::ST1D_IMM:

		case AArch64::ST2B_IMM:
		case AArch64::ST2H_IMM:
		case AArch64::ST2W_IMM:
		case AArch64::ST2D_IMM:
		case AArch64::ST3B_IMM:
		case AArch64::ST3H_IMM:
		case AArch64::ST3W_IMM:
		case AArch64::ST3D_IMM:
		case AArch64::ST4B_IMM:
		case AArch64::ST4H_IMM:
		case AArch64::ST4W_IMM:
		case AArch64::ST4D_IMM:

case AArch64::LD1RB_IMM:		case AArch64::LD1RB_IMM:
case AArch64::LD1RB_H_IMM:		case AArch64::LD1RB_H_IMM:
case AArch64::LD1RB_S_IMM:		case AArch64::LD1RB_S_IMM:
case AArch64::LD1RB_D_IMM:		case AArch64::LD1RB_D_IMM:
case AArch64::LD1RSB_H_IMM:		case AArch64::LD1RSB_H_IMM:
case AArch64::LD1RSB_S_IMM:		case AArch64::LD1RSB_S_IMM:
case AArch64::LD1RSB_D_IMM:		case AArch64::LD1RSB_D_IMM:
case AArch64::LD1RH_IMM:		case AArch64::LD1RH_IMM:
▲ Show 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::getMemOpInfo(unsigned Opcode, TypeSize &Scale,
case AArch64::LDNF1D_IMM:		case AArch64::LDNF1D_IMM:
// A full vectors worth of data		// A full vectors worth of data
// Width = mbytes * elements		// Width = mbytes * elements
Scale = TypeSize::Scalable(16);		Scale = TypeSize::Scalable(16);
Width = SVEMaxBytesPerVector;		Width = SVEMaxBytesPerVector;
MinOffset = -8;		MinOffset = -8;
MaxOffset = 7;		MaxOffset = 7;
break;		break;
		case AArch64::LD2B_IMM:
		sdesmalenUnsubmitted Done Reply Inline Actions nit: remove redundant newlines here and below. sdesmalen: nit: remove redundant newlines here and below.
		case AArch64::LD2H_IMM:
		case AArch64::LD2W_IMM:
		case AArch64::LD2D_IMM:
		case AArch64::ST2B_IMM:
		case AArch64::ST2H_IMM:
		case AArch64::ST2W_IMM:
		case AArch64::ST2D_IMM:
		Scale = TypeSize::Scalable(32);
		Width = SVEMaxBytesPerVector * 2;
		david-armUnsubmitted Done Reply Inline Actions nit: I think you can just write TypeSize::Scalable(32) and similarly for the others. david-arm: nit: I think you can just write TypeSize::Scalable(32) and similarly for the others.
		MinOffset = -8;
		MaxOffset = 7;
		break;
		case AArch64::LD3B_IMM:
		case AArch64::LD3H_IMM:
		case AArch64::LD3W_IMM:
		case AArch64::LD3D_IMM:
		case AArch64::ST3B_IMM:
		case AArch64::ST3H_IMM:
		case AArch64::ST3W_IMM:
		case AArch64::ST3D_IMM:
		Scale = TypeSize::Scalable(48);
		Width = SVEMaxBytesPerVector * 3;
		MinOffset = -8;
		MaxOffset = 7;
		break;
		case AArch64::LD4B_IMM:
		case AArch64::LD4H_IMM:
		case AArch64::LD4W_IMM:
		case AArch64::LD4D_IMM:
		case AArch64::ST4B_IMM:
		case AArch64::ST4H_IMM:
		case AArch64::ST4W_IMM:
		case AArch64::ST4D_IMM:
		Scale = TypeSize::Scalable(64);
		Width = SVEMaxBytesPerVector * 4;
		MinOffset = -8;
		MaxOffset = 7;
		break;
case AArch64::LD1B_H_IMM:		case AArch64::LD1B_H_IMM:
case AArch64::LD1SB_H_IMM:		case AArch64::LD1SB_H_IMM:
case AArch64::LD1H_S_IMM:		case AArch64::LD1H_S_IMM:
case AArch64::LD1SH_S_IMM:		case AArch64::LD1SH_S_IMM:
case AArch64::LD1W_D_IMM:		case AArch64::LD1W_D_IMM:
case AArch64::LD1SW_D_IMM:		case AArch64::LD1SW_D_IMM:
case AArch64::ST1B_H_IMM:		case AArch64::ST1B_H_IMM:
case AArch64::ST1H_S_IMM:		case AArch64::ST1H_S_IMM:
▲ Show 20 Lines • Show All 4,886 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-ld2-alloca.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s \| FileCheck %s
				david-armUnsubmitted Not Done Reply Inline Actions nit: I think maybe we don't need this file anymore, since all the various loads/stores are now covered in the other tests? david-arm: nit: I think maybe we don't need this file anymore, since all the various loads/stores are now…
				sdesmalenUnsubmitted Not Done Reply Inline Actions Perhaps we can keep this one test if the other fixed-width tests are removed, just to ensure we cover that case too (even if it doesn't necessarily exercise anything specific in this patch that isn't already tested otherwise). If so, then I'd recommend expressing it with an explicit st2 intrinsic instead of a shufflevector+store. sdesmalen: Perhaps we can keep this one test if the other fixed-width tests are removed, just to ensure we…

				sdesmalenUnsubmitted Done Reply Inline Actions redundant if vscale_range is set. sdesmalen: redundant if vscale_range is set.
				target triple = "aarch64-unknown-linux-gnu"

				define void @st1d_fixed(<8 x double>* %ptr) #0 {
				; CHECK-LABEL: st1d_fixed:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: add x8, sp, #8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld2d { z0.d, z1.d }, p0/z, [x8]
				; CHECK-NEXT: mov x8, #4
				; CHECK-NEXT: mov z0.d, #0 // =0x0
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				; CHECK-NEXT: st1d { z0.d }, p0, [x0, x8, lsl #3]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%alloc = alloca [16 x double], i32 0
				%bc = bitcast [16 x double]* %alloc to <8 x double>*
				%load = load <8 x double>, <8 x double>* %bc
				%strided.vec = shufflevector <8 x double> %load, <8 x double> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				store <8 x double> zeroinitializer, <8 x double>* %ptr
				ret void
				}

				attributes #0 = { "target-features"="+sve" vscale_range(2,2) nounwind }

llvm/test/CodeGen/AArch64/sve-ldN.mir

This file was added.

				# RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -run-pass=prologepilog -simplify-mir -verify-machineinstrs %s -o - \| FileCheck %s
				# RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -start-before=prologepilog -simplify-mir -verify-machineinstrs %s -o - \| FileCheck %s --check-prefix=CHECK-OFFSET

				--- \|
				define void @testcase_valid_offset() nounwind { entry: unreachable }
				define void @testcase_offset_out_of_range() nounwind { entry: unreachable }
				...
				---
				name: testcase_valid_offset
				tracksRegLiveness: true
				stack:
				- { id: 0, name: '', type: default, offset: 0, size: 512, alignment: 16, stack-id: scalable-vector }
				body: \|
				bb.0:
				liveins: $p0

				; CHECK-LABEL: name: testcase_valid_offset
				; CHECK: liveins: $p0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: early-clobber $sp = frame-setup STRXpre killed $fp, $sp, -16 :: (store (s64) into %stack.1)
				; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -32
				; CHECK-NEXT: renamable $z0_z1 = LD2B_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1 = LD2B_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1 = LD2H_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1 = LD2H_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1 = LD2W_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1 = LD2W_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1 = LD2D_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1 = LD2D_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3B_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3B_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3H_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3H_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3W_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3W_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3D_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3D_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, $sp, -8
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, $sp, 7
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 31
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 1
				; CHECK-NEXT: early-clobber $sp, $fp = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
				; CHECK-NEXT: RET_ReallyLR implicit $z0, implicit $z1, implicit $z2, implicit $z3

				; CHECK-OFFSET-LABEL: testcase_valid_offset:
				; CHECK-OFFSET: str x29, [sp, #-16]!
				; CHECK-OFFSET-NEXT: addvl sp, sp, #-32
				; CHECK-OFFSET-NEXT: ld2b { z0.b, z1.b }, p0/z, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: ld2b { z0.b, z1.b }, p0/z, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: ld2h { z0.h, z1.h }, p0/z, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: ld2h { z0.h, z1.h }, p0/z, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: ld2w { z0.s, z1.s }, p0/z, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: ld2w { z0.s, z1.s }, p0/z, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: ld2d { z0.d, z1.d }, p0/z, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: ld2d { z0.d, z1.d }, p0/z, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: ld3h { z0.h, z1.h, z2.h }, p0/z, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: ld3h { z0.h, z1.h, z2.h }, p0/z, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: ld3w { z0.s, z1.s, z2.s }, p0/z, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: ld3w { z0.s, z1.s, z2.s }, p0/z, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: ld3d { z0.d, z1.d, z2.d }, p0/z, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: ld3d { z0.d, z1.d, z2.d }, p0/z, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl sp, sp, #31
				; CHECK-OFFSET-NEXT: addvl sp, sp, #1
				; CHECK-OFFSET-NEXT: ldr x29, [sp], #16
				; CHECK-OFFSET-NEXT: ret

				renamable $z0_z1 = LD2B_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1 = LD2B_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1 = LD2H_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1 = LD2H_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1 = LD2W_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1 = LD2W_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1 = LD2D_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1 = LD2D_IMM renamable $p0, %stack.0, 7

				renamable $z0_z1_z2 = LD3B_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2 = LD3B_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1_z2 = LD3H_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2 = LD3H_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1_z2 = LD3W_IMM renamable $p0, %stack.0, -8
				sdesmalenUnsubmitted Done Reply Inline Actions I don't think this is correct, as I expect this to be a total offset of ptr + 2 + 14. i.e. an ld2w instruction loads 2 vectors worth of data. The immediate is a multiple of 2, so the maximum immediate value to load from is ptr + 14 * sizeof(vector). When we load 2 * sizeof(vector) beyond that, I would expect it to load from ptr + 16, not ptr + 18. It depends on where the scaling is done. In this case, I think the scaling for this operand should be `TypeSize::Scalable(16)`, because the immediate is already expected to be scaled (it is already a multiple of 2/3/4). The encoding of the immediate will then divide the offset by 2, 3 or 4. If we would have implemented the operand as an operand that is not yet scaled, then we'd need to do the scaling here, as well as in the operand printer. sdesmalen: I don't think this is correct, as I expect this to be a total offset of ptr + 2 + 14. i.e. an…
				renamable $z0_z1_z2 = LD3W_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1_z2 = LD3D_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2 = LD3D_IMM renamable $p0, %stack.0, 7

				renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, %stack.0, 7
				renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, %stack.0, -8
				renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, %stack.0, 7
				RET_ReallyLR implicit $z0, implicit $z1, implicit $z2, implicit $z3
				...
				---
				name: testcase_offset_out_of_range
				tracksRegLiveness: true
				stack:
				- { id: 0, name: '', type: default, offset: 0, size: 512, alignment: 16, stack-id: scalable-vector }
				body: \|
				bb.0:
				liveins: $p0

				; CHECK-LABEL: name: testcase_offset_out_of_range
				; CHECK: liveins: $p0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: early-clobber $sp = frame-setup STRXpre killed $fp, $sp, -16 :: (store (s64) into %stack.1)
				; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -32
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: renamable $z0_z1 = LD2B_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: renamable $z0_z1 = LD2B_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: renamable $z0_z1 = LD2H_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: renamable $z0_z1 = LD2H_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: renamable $z0_z1 = LD2W_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: renamable $z0_z1 = LD2W_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: renamable $z0_z1 = LD2D_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: renamable $z0_z1 = LD2D_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3B_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3B_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3H_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3H_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3W_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3W_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3D_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: renamable $z0_z1_z2 = LD3D_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, killed $x8, 7
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 31
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 1
				; CHECK-NEXT: early-clobber $sp, $fp = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
				; CHECK-NEXT: RET_ReallyLR implicit $z0, implicit $z1, implicit $z2, implicit $z3

				; CHECK-OFFSET-LABEL: testcase_offset_out_of_range:
				; CHECK-OFFSET: str x29, [sp, #-16]!
				; CHECK-OFFSET-NEXT: addvl sp, sp, #-32
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: ld2b { z0.b, z1.b }, p0/z, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: ld2b { z0.b, z1.b }, p0/z, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: ld2h { z0.h, z1.h }, p0/z, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: ld2h { z0.h, z1.h }, p0/z, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: ld2w { z0.s, z1.s }, p0/z, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: ld2w { z0.s, z1.s }, p0/z, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: ld2d { z0.d, z1.d }, p0/z, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: ld2d { z0.d, z1.d }, p0/z, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: ld3b { z0.b, z1.b, z2.b }, p0/z, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: ld3h { z0.h, z1.h, z2.h }, p0/z, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: ld3h { z0.h, z1.h, z2.h }, p0/z, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: ld3w { z0.s, z1.s, z2.s }, p0/z, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: ld3w { z0.s, z1.s, z2.s }, p0/z, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: ld3d { z0.d, z1.d, z2.d }, p0/z, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: ld3d { z0.d, z1.d, z2.d }, p0/z, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: ld4b { z0.b, z1.b, z2.b, z3.b }, p0/z, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: ld4h { z0.h, z1.h, z2.h, z3.h }, p0/z, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: ld4w { z0.s, z1.s, z2.s, z3.s }, p0/z, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl sp, sp, #31
				; CHECK-OFFSET-NEXT: addvl sp, sp, #1
				; CHECK-OFFSET-NEXT: ldr x29, [sp], #16
				; CHECK-OFFSET-NEXT: ret

				renamable $z0_z1 = LD2B_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1 = LD2B_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1 = LD2H_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1 = LD2H_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1 = LD2W_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1 = LD2W_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1 = LD2D_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1 = LD2D_IMM renamable $p0, %stack.0, 8

				renamable $z0_z1_z2 = LD3B_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2 = LD3B_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1_z2 = LD3H_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2 = LD3H_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1_z2 = LD3W_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2 = LD3W_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1_z2 = LD3D_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2 = LD3D_IMM renamable $p0, %stack.0, 8

				renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2_z3 = LD4B_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2_z3 = LD4H_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2_z3 = LD4W_IMM renamable $p0, %stack.0, 8
				renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, %stack.0, -9
				renamable $z0_z1_z2_z3 = LD4D_IMM renamable $p0, %stack.0, 8
				RET_ReallyLR implicit $z0, implicit $z1, implicit $z2, implicit $z3
				...

llvm/test/CodeGen/AArch64/sve-stN.mir

This file was added.

				# RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -run-pass=prologepilog -simplify-mir -verify-machineinstrs %s -o - \| FileCheck %s
				# RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -start-before=prologepilog -simplify-mir -verify-machineinstrs %s -o - \| FileCheck %s --check-prefix=CHECK-OFFSET

				--- \|
				define void @testcase_valid_offset() nounwind { entry: unreachable }
				define void @testcase_offset_out_of_range() nounwind { entry: unreachable }
				...
				---
				name: testcase_valid_offset
				tracksRegLiveness: true
				stack:
				- { id: 0, name: '', type: default, offset: 0, size: 512, alignment: 16, stack-id: scalable-vector }
				body: \|
				bb.0:
				liveins: $p0, $z0

				; CHECK-LABEL: name: testcase_valid_offset
				; CHECK: liveins: $p0, $z0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: early-clobber $sp = frame-setup STRXpre killed $fp, $sp, -16 :: (store (s64) into %stack.1)
				; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -32
				; CHECK-NEXT: ST2B_IMM renamable $z0_z1, renamable $p0, $sp, -8
				; CHECK-NEXT: ST2B_IMM renamable $z0_z1, renamable $p0, $sp, 7
				; CHECK-NEXT: ST2H_IMM renamable $z0_z1, renamable $p0, $sp, -8
				; CHECK-NEXT: ST2H_IMM renamable $z0_z1, renamable $p0, $sp, 7
				; CHECK-NEXT: ST2W_IMM renamable $z0_z1, renamable $p0, $sp, -8
				; CHECK-NEXT: ST2W_IMM renamable $z0_z1, renamable $p0, $sp, 7
				; CHECK-NEXT: ST2D_IMM renamable $z0_z1, renamable $p0, $sp, -8
				; CHECK-NEXT: ST2D_IMM renamable $z0_z1, renamable $p0, $sp, 7
				; CHECK-NEXT: ST3B_IMM renamable $z0_z1_z2, renamable $p0, $sp, -8
				; CHECK-NEXT: ST3B_IMM renamable $z0_z1_z2, renamable $p0, $sp, 7
				; CHECK-NEXT: ST3H_IMM renamable $z0_z1_z2, renamable $p0, $sp, -8
				; CHECK-NEXT: ST3H_IMM renamable $z0_z1_z2, renamable $p0, $sp, 7
				; CHECK-NEXT: ST3W_IMM renamable $z0_z1_z2, renamable $p0, $sp, -8
				; CHECK-NEXT: ST3W_IMM renamable $z0_z1_z2, renamable $p0, $sp, 7
				; CHECK-NEXT: ST3D_IMM renamable $z0_z1_z2, renamable $p0, $sp, -8
				; CHECK-NEXT: ST3D_IMM renamable $z0_z1_z2, renamable $p0, $sp, 7
				; CHECK-NEXT: ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, -8
				; CHECK-NEXT: ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, 7
				; CHECK-NEXT: ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, -8
				; CHECK-NEXT: ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, 7
				; CHECK-NEXT: ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, -8
				; CHECK-NEXT: ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, 7
				; CHECK-NEXT: ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, -8
				; CHECK-NEXT: ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, $sp, 7
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 31
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 1
				; CHECK-NEXT: early-clobber $sp, $fp = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
				; CHECK-NEXT: RET_ReallyLR

				; CHECK-OFFSET-LABEL: testcase_valid_offset:
				; CHECK-OFFSET: str x29, [sp, #-16]!
				; CHECK-OFFSET-NEXT: addvl sp, sp, #-32
				; CHECK-OFFSET-NEXT: st2b { z0.b, z1.b }, p0, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: st2b { z0.b, z1.b }, p0, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: st2h { z0.h, z1.h }, p0, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: st2h { z0.h, z1.h }, p0, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: st2w { z0.s, z1.s }, p0, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: st2w { z0.s, z1.s }, p0, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: st2d { z0.d, z1.d }, p0, [sp, #-16, mul vl]
				; CHECK-OFFSET-NEXT: st2d { z0.d, z1.d }, p0, [sp, #14, mul vl]
				; CHECK-OFFSET-NEXT: st3b { z0.b, z1.b, z2.b }, p0, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: st3b { z0.b, z1.b, z2.b }, p0, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: st3h { z0.h, z1.h, z2.h }, p0, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: st3h { z0.h, z1.h, z2.h }, p0, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: st3w { z0.s, z1.s, z2.s }, p0, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: st3w { z0.s, z1.s, z2.s }, p0, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: st3d { z0.d, z1.d, z2.d }, p0, [sp, #-24, mul vl]
				; CHECK-OFFSET-NEXT: st3d { z0.d, z1.d, z2.d }, p0, [sp, #21, mul vl]
				; CHECK-OFFSET-NEXT: st4b { z0.b, z1.b, z2.b, z3.b }, p0, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: st4b { z0.b, z1.b, z2.b, z3.b }, p0, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: st4h { z0.h, z1.h, z2.h, z3.h }, p0, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: st4h { z0.h, z1.h, z2.h, z3.h }, p0, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: st4w { z0.s, z1.s, z2.s, z3.s }, p0, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: st4w { z0.s, z1.s, z2.s, z3.s }, p0, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: st4d { z0.d, z1.d, z2.d, z3.d }, p0, [sp, #-32, mul vl]
				; CHECK-OFFSET-NEXT: st4d { z0.d, z1.d, z2.d, z3.d }, p0, [sp, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl sp, sp, #31
				; CHECK-OFFSET-NEXT: addvl sp, sp, #1
				; CHECK-OFFSET-NEXT: ldr x29, [sp], #16
				; CHECK-OFFSET-NEXT: ret

				ST2B_IMM renamable $z0_z1, renamable $p0, %stack.0, -8
				ST2B_IMM renamable $z0_z1, renamable $p0, %stack.0, 7
				ST2H_IMM renamable $z0_z1, renamable $p0, %stack.0, -8
				ST2H_IMM renamable $z0_z1, renamable $p0, %stack.0, 7
				ST2W_IMM renamable $z0_z1, renamable $p0, %stack.0, -8
				ST2W_IMM renamable $z0_z1, renamable $p0, %stack.0, 7
				ST2D_IMM renamable $z0_z1, renamable $p0, %stack.0, -8
				ST2D_IMM renamable $z0_z1, renamable $p0, %stack.0, 7

				ST3B_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -8
				ST3B_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 7
				ST3H_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -8
				ST3H_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 7
				ST3W_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -8
				ST3W_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 7
				ST3D_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -8
				ST3D_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 7

				ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -8
				ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 7
				ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -8
				ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 7
				ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -8
				ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 7
				ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -8
				ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 7
				RET_ReallyLR
				...
				---
				name: testcase_offset_out_of_range
				tracksRegLiveness: true
				stack:
				- { id: 0, name: '', type: default, offset: 0, size: 512, alignment: 16, stack-id: scalable-vector }
				body: \|
				bb.0:
				liveins: $p0, $z0

				; CHECK-LABEL: name: testcase_offset_out_of_range
				; CHECK: liveins: $p0, $z0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: early-clobber $sp = frame-setup STRXpre killed $fp, $sp, -16 :: (store (s64) into %stack.1)
				; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -32
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: ST2B_IMM renamable $z0_z1, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: ST2B_IMM renamable $z0_z1, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: ST2H_IMM renamable $z0_z1, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: ST2H_IMM renamable $z0_z1, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: ST2W_IMM renamable $z0_z1, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: ST2W_IMM renamable $z0_z1, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -2
				; CHECK-NEXT: ST2D_IMM renamable $z0_z1, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 2
				; CHECK-NEXT: ST2D_IMM renamable $z0_z1, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: ST3B_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: ST3B_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: ST3H_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: ST3H_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: ST3W_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: ST3W_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -3
				; CHECK-NEXT: ST3D_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 3
				; CHECK-NEXT: ST3D_IMM renamable $z0_z1_z2, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, -4
				; CHECK-NEXT: ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, -8
				; CHECK-NEXT: $x8 = ADDVL_XXI $sp, 4
				; CHECK-NEXT: ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, killed $x8, 7
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 31
				; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 1
				; CHECK-NEXT: early-clobber $sp, $fp = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
				; CHECK-NEXT: RET_ReallyLR

				; CHECK-OFFSET-LABEL: testcase_offset_out_of_range
				; CHECK-OFFSET: str x29, [sp, #-16]!
				; CHECK-OFFSET-NEXT: addvl sp, sp, #-32
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: st2b { z0.b, z1.b }, p0, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: st2b { z0.b, z1.b }, p0, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: st2h { z0.h, z1.h }, p0, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: st2h { z0.h, z1.h }, p0, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: st2w { z0.s, z1.s }, p0, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: st2w { z0.s, z1.s }, p0, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-2
				; CHECK-OFFSET-NEXT: st2d { z0.d, z1.d }, p0, [x8, #-16, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #2
				; CHECK-OFFSET-NEXT: st2d { z0.d, z1.d }, p0, [x8, #14, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: st3b { z0.b, z1.b, z2.b }, p0, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: st3b { z0.b, z1.b, z2.b }, p0, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: st3h { z0.h, z1.h, z2.h }, p0, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: st3h { z0.h, z1.h, z2.h }, p0, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: st3w { z0.s, z1.s, z2.s }, p0, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: st3w { z0.s, z1.s, z2.s }, p0, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-3
				; CHECK-OFFSET-NEXT: st3d { z0.d, z1.d, z2.d }, p0, [x8, #-24, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #3
				; CHECK-OFFSET-NEXT: st3d { z0.d, z1.d, z2.d }, p0, [x8, #21, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: st4b { z0.b, z1.b, z2.b, z3.b }, p0, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: st4b { z0.b, z1.b, z2.b, z3.b }, p0, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: st4h { z0.h, z1.h, z2.h, z3.h }, p0, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: st4h { z0.h, z1.h, z2.h, z3.h }, p0, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: st4w { z0.s, z1.s, z2.s, z3.s }, p0, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: st4w { z0.s, z1.s, z2.s, z3.s }, p0, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #-4
				; CHECK-OFFSET-NEXT: st4d { z0.d, z1.d, z2.d, z3.d }, p0, [x8, #-32, mul vl]
				; CHECK-OFFSET-NEXT: addvl x8, sp, #4
				; CHECK-OFFSET-NEXT: st4d { z0.d, z1.d, z2.d, z3.d }, p0, [x8, #28, mul vl]
				; CHECK-OFFSET-NEXT: addvl sp, sp, #31
				; CHECK-OFFSET-NEXT: addvl sp, sp, #1
				; CHECK-OFFSET-NEXT: ldr x29, [sp], #16
				; CHECK-OFFSET-NEXT: ret

				ST2B_IMM renamable $z0_z1, renamable $p0, %stack.0, -9
				ST2B_IMM renamable $z0_z1, renamable $p0, %stack.0, 8
				ST2H_IMM renamable $z0_z1, renamable $p0, %stack.0, -9
				ST2H_IMM renamable $z0_z1, renamable $p0, %stack.0, 8
				ST2W_IMM renamable $z0_z1, renamable $p0, %stack.0, -9
				ST2W_IMM renamable $z0_z1, renamable $p0, %stack.0, 8
				ST2D_IMM renamable $z0_z1, renamable $p0, %stack.0, -9
				ST2D_IMM renamable $z0_z1, renamable $p0, %stack.0, 8

				ST3B_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -9
				ST3B_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 8
				ST3H_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -9
				ST3H_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 8
				ST3W_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -9
				ST3W_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 8
				ST3D_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, -9
				ST3D_IMM renamable $z0_z1_z2, renamable $p0, %stack.0, 8

				ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -9
				ST4B_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 8
				ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -9
				ST4H_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 8
				ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -9
				ST4W_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 8
				ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, -9
				ST4D_IMM renamable $z0_z1_z2_z3, renamable $p0, %stack.0, 8
				RET_ReallyLR
				...