This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
4/9
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
4/7
insert-subvector-res-legalization.ll

Differential D102766

[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR
ClosedPublic

Authored by bsmith on May 19 2021, 6:01 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
peterwaller-arm
DavidTruby
efriedma

Commits

rG2668727929e4: [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR

Summary

Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.

This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.

This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.

Depends on: D102765

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	310 ms	x64 debian > LLVM.CodeGen/AArch64::insert-subvector-res-legalization.ll
	270 ms	x64 windows > LLVM.CodeGen/AArch64::insert-subvector-res-legalization.ll

Event Timeline

joechrisellis created this revision.May 19 2021, 6:01 AM

Herald added subscribers: ecnelises, hiraditya. · View Herald TranscriptMay 19 2021, 6:01 AM

joechrisellis requested review of this revision.May 19 2021, 6:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2021, 6:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B105215: Diff 346426.May 19 2021, 7:09 AM

Some suggestions.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704	nit: Extraneous blank.
4717	For me, the term 'spilling' is usually associated with running out of registers and needing to create register space by spilling them to the stack. I think a comment here instead should express the intent of the code, something like "To insert SubVec into Vec, store the wider vector to memory, overwrite the lower half with the narrower vector, and reload". The other comments can probably be removed.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15	I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes.

joechrisellis marked an inline comment as done.May 20 2021, 2:18 AM

joechrisellis added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704	This is consistent with `PromoteIntRes_EXTRACT_SUBVECTOR` some 60-ish lines above.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15	I tried this and `llvm/utils/update_llc_test_checks.py` doesn't spit anything out. I have hit this issue before. IIRC, the regular expression that is used by the the script to delimit the functions in the assembly codes doesn't function as expected if the CFI escapes are missing. Might submit a patch for this later if I can recall what the issue was. FWIW: $ grep -Rl 'Assertions have been' llvm/test//AArch64//* \| xargs grep -l 'asm-verbose=0' llvm/test/CodeGen/AArch64/bf16-vector-shuffle.ll ... there's only one file with autogen'd assertions that does use `-asm-verbose=0`.

Address review comments.

@peterwaller-arm:
- comment clarifications.

peterwaller-arm added inline comments.May 20 2021, 2:24 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704	This is only a nit, but the majority of functions in this file don't have it and I think PromoteIntRes_EXTRACT_SUBVECTOR is in error. It's sufficiently nearby and related code that I'd be tempted to remove it from that one too to maintain local consistenc (but not anywhere else, there are other examples in this file).

Matt added a subscriber: Matt.May 20 2021, 2:46 AM

Harbormaster completed remote builds in B105380: Diff 346665.May 20 2021, 3:34 AM

peterwaller-arm added inline comments.May 20 2021, 4:08 AM

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
108	As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems to overwrite elements {1,2}. So I am not convinced this is correct.

paulwalker-arm added inline comments.May 20 2021, 10:42 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704	Functional issues aside, I'm with @peterwaller-arm on this one.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15	`attributes #0 = { nounwind "target-features"="+sve" }` will see the CFI entries removed.

peterwaller-arm added inline comments.May 20 2021, 12:33 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4717	Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower order bits. So the comment needs adjusting.

paulwalker-arm mentioned this in D102765: [SelectionDAG] Add stub implementation of ReplaceInsertSubVectorResults.May 21 2021, 6:31 AM

joechrisellis edited the summary of this revision. (Show Details)May 21 2021, 9:14 AM

Fold in D102765 + address nits.

joechrisellis planned changes to this revision.May 21 2021, 9:43 AM

Harbormaster completed remote builds in B105662: Diff 347066.May 21 2021, 10:40 AM

Scale index by vscale.

Remove CFI entries from test.

joechrisellis marked an inline comment as done.Jun 2 2021, 3:08 AM

Harbormaster completed remote builds in B107205: Diff 349213.Jun 2 2021, 4:29 AM

peterwaller-arm added inline comments.Jun 17 2021, 3:13 AM

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
9	Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required in the fixed case, but that might shrink the code a little.
25	There is a bit of extraneous stuff going on in these tests, if you choose a couple of optimization passes are you able to shrink them a bit? I'm looking at the store of x29 and extra addpl.

peterwaller-arm added inline comments.Jun 17 2021, 1:23 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4731	Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if non-zero. This would get rid of some `rdvl ..., #0`.

Address review comments:

@peterwaller-arm:
- only scale if the index is non-zero.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4731	Good shout -- that's removed a few instructions from the test cases.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
25	`-O3` doesn't change the lowering here, so I am not sure if these test cases can be reduced by using different flags.

Harbormaster completed remote builds in B109946: Diff 353024.Jun 19 2021, 3:22 AM

Update insertion indices. The old insertion indices will be reported as invalid
when D104468 is merged.

Harbormaster completed remote builds in B110172: Diff 353334.Jun 21 2021, 5:45 AM

Bugfixes.

Harbormaster completed remote builds in B110219: Diff 353396.Jun 21 2021, 10:55 AM

peterwaller-arm added inline comments.Jun 22 2021, 3:29 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4743	My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for what is needed here. It's necessary to ensure the upper element of the inserted fixed-width vector fits into the scalable vector. getVectorElementPointer is effectively only forcing that the first element is in bounds, but it's the last element which matters.

Bugfixes.

Harbormaster completed remote builds in B110779: Diff 354181.Jun 24 2021, 3:45 AM

joechrisellis edited the summary of this revision. (Show Details)Jun 24 2021, 9:52 AM

bsmith commandeered this revision.Jun 30 2021, 5:48 AM

bsmith edited reviewers, added: joechrisellis; removed: bsmith.

Use new getVectorSubVecPointer TLI function to ensure correct clamping
Fixup tests due to changes

Harbormaster completed remote builds in B111735: Diff 355520.Jun 30 2021, 7:01 AM

peterwaller-arm accepted this revision.Jun 30 2021, 7:25 AM

This revision is now accepted and ready to land.Jun 30 2021, 7:25 AM

This revision was landed with ongoing or failed builds.Jul 1 2021, 9:06 AM

Closed by commit rG2668727929e4: [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR (authored by bsmith). · Explain Why

This revision was automatically updated to reflect the committed changes.

bsmith added a commit: rG2668727929e4: [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR.

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:

`%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)`

and you promoted all of the scalable types to their equivalent legal types you'd end up with:

`%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)`

This ends up no longer being a valid use of vector.insert since the element types differ.

In D102766#2857778, @bsmith wrote:
In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)
and you promoted all of the scalable types to their equivalent legal types you'd end up with:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)
This ends up no longer being a valid use of vector.insert since the element types differ.

Right. My suggestion is that you promote from:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)

to:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)

This isn't legal, but we need code to handle it anyway.

In D102766#2858467, @efriedma wrote:
In D102766#2857778, @bsmith wrote:
In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)
and you promoted all of the scalable types to their equivalent legal types you'd end up with:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)
This ends up no longer being a valid use of vector.insert since the element types differ.
Right. My suggestion is that you promote from:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)
to:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)
This isn't legal, but we need code to handle it anyway.

Are you suggesting to still go through memory but to do it during operand legalization instead?

In D102766#2859581, @bsmith wrote:
In D102766#2858467, @efriedma wrote:
In D102766#2857778, @bsmith wrote:
In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)
and you promoted all of the scalable types to their equivalent legal types you'd end up with:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)
This ends up no longer being a valid use of vector.insert since the element types differ.
Right. My suggestion is that you promote from:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)
to:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)
This isn't legal, but we need code to handle it anyway.
Are you suggesting to still go through memory but to do it during operand legalization instead?

Yes, sort of...

In some cases, we might not end up going through memory; we currently have some custom lowering support for some special cases, and might add more cases in the future.

Are you suggesting to still go through memory but to do it during operand legalization instead?

Yes, sort of...

In some cases, we might not end up going through memory; we currently have some custom lowering support for some special cases, and might add more cases in the future.

This should be sorted out in https://reviews.llvm.org/D105624

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

LegalizeIntegerTypes.cpp

34 lines

LegalizeTypes.h

1 line

test/

CodeGen/

AArch64/

insert-subvector-res-legalization.ll

265 lines

Diff 346665

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	#endif
case ISD::SRL: Res = PromoteIntRes_SRL(N); break;		case ISD::SRL: Res = PromoteIntRes_SRL(N); break;
case ISD::TRUNCATE: Res = PromoteIntRes_TRUNCATE(N); break;		case ISD::TRUNCATE: Res = PromoteIntRes_TRUNCATE(N); break;
case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;		case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;
case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;		case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;
case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;		case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;

case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;		Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;
		case ISD::INSERT_SUBVECTOR:
		Res = PromoteIntRes_INSERT_SUBVECTOR(N); break;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Res = PromoteIntRes_INSERT_SUBVECTOR(N); break; + Res = PromoteIntRes_INSERT_SUBVECTOR(N); + break; Lint: Pre-merge checks: clang-format: please reformat the code ``` - Res =…
case ISD::VECTOR_REVERSE:		case ISD::VECTOR_REVERSE:
Res = PromoteIntRes_VECTOR_REVERSE(N); break;		Res = PromoteIntRes_VECTOR_REVERSE(N); break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;		Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;
case ISD::VECTOR_SPLICE:		case ISD::VECTOR_SPLICE:
Res = PromoteIntRes_VECTOR_SPLICE(N); break;		Res = PromoteIntRes_VECTOR_SPLICE(N); break;
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;		Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;
▲ Show 20 Lines • Show All 4,586 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != OutNumElems; ++i) {
SDValue Op = DAG.getAnyExtOrTrunc(Ext, dl, NOutVTElem);		SDValue Op = DAG.getAnyExtOrTrunc(Ext, dl, NOutVTElem);
// Insert the converted element to the new vector.		// Insert the converted element to the new vector.
Ops.push_back(Op);		Ops.push_back(Op);
}		}

return DAG.getBuildVector(NOutVT, dl, Ops);		return DAG.getBuildVector(NOutVT, dl, Ops);
}		}

		SDValue DAGTypeLegalizer::PromoteIntRes_INSERT_SUBVECTOR(SDNode *N) {

		peterwaller-armUnsubmitted Not Done Reply Inline Actions nit: Extraneous blank. peterwaller-arm: nit: Extraneous blank.
		joechrisellisUnsubmitted Done Reply Inline Actions This is consistent with `PromoteIntRes_EXTRACT_SUBVECTOR` some 60-ish lines above. joechrisellis: This is consistent with `PromoteIntRes_EXTRACT_SUBVECTOR` some 60-ish lines above.
		peterwaller-armUnsubmitted Not Done Reply Inline Actions This is only a nit, but the majority of functions in this file don't have it and I think PromoteIntRes_EXTRACT_SUBVECTOR is in error. It's sufficiently nearby and related code that I'd be tempted to remove it from that one too to maintain local consistenc (but not anywhere else, there are other examples in this file). peterwaller-arm: This is only a nit, but the majority of functions in this file don't have it and I think…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Functional issues aside, I'm with @peterwaller-arm on this one. paulwalker-arm: Functional issues aside, I'm with @peterwaller-arm on this one.
		EVT OutVT = N->getValueType(0);
		EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT);
		assert(NOutVT.isVector() && "This type must be promoted to a vector type");

		SDLoc dl(N);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dl' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dl' [readability-identifier-naming]…
		SDValue Vec = N->getOperand(0);
		SDValue SubVec = N->getOperand(1);
		SDValue Idx = N->getOperand(2);

		EVT VecVT = Vec.getValueType();
		EVT SubVecVT = SubVec.getValueType();

		// To insert SubVec into Vec, store the wider vector to memory, overwrite the
		peterwaller-armUnsubmitted Done Reply Inline Actions For me, the term 'spilling' is usually associated with running out of registers and needing to create register space by spilling them to the stack. I think a comment here instead should express the intent of the code, something like "To insert SubVec into Vec, store the wider vector to memory, overwrite the lower half with the narrower vector, and reload". The other comments can probably be removed. peterwaller-arm: For me, the term 'spilling' is usually associated with running out of registers and needing to…
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower order bits. So the comment needs adjusting. peterwaller-arm: Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower…
		// lower half with the narrower vector, and reload.
		Align SmallestAlign = DAG.getReducedAlign(SubVecVT, /UseABI=/false);
		SDValue StackPtr =
		DAG.CreateStackTemporary(VecVT.getStoreSize(), SmallestAlign);
		auto &MF = DAG.getMachineFunction();
		auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
		auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);

		SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, PtrInfo,
		SmallestAlign);

		SDValue SubVecPtr = TLI.getVectorElementPointer(DAG, StackPtr, SubVecVT, Idx);
		Store = DAG.getStore(Store, dl, SubVec, SubVecPtr, PtrInfo, SmallestAlign);

		peterwaller-armUnsubmitted Done Reply Inline Actions Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if non-zero. This would get rid of some `rdvl ..., #0`. peterwaller-arm: Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if…
		joechrisellisUnsubmitted Done Reply Inline Actions Good shout -- that's removed a few instructions from the test cases. joechrisellis: Good shout -- that's removed a few instructions from the test cases.
		return DAG.getLoad(NOutVT, dl, Store, StackPtr, PtrInfo, SmallestAlign);
		}

SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_REVERSE(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_REVERSE(SDNode *N) {
SDLoc dl(N);		SDLoc dl(N);

SDValue V0 = GetPromotedInteger(N->getOperand(0));		SDValue V0 = GetPromotedInteger(N->getOperand(0));
EVT OutVT = V0.getValueType();		EVT OutVT = V0.getValueType();

return DAG.getNode(ISD::VECTOR_REVERSE, dl, OutVT, V0);		return DAG.getNode(ISD::VECTOR_REVERSE, dl, OutVT, V0);
}		}

		peterwaller-armUnsubmitted Not Done Reply Inline Actions My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for what is needed here. It's necessary to ensure the upper element of the inserted fixed-width vector fits into the scalable vector. getVectorElementPointer is effectively only forcing that the first element is in bounds, but it's the last element which matters. peterwaller-arm: My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for…
SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SHUFFLE(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SHUFFLE(SDNode *N) {
ShuffleVectorSDNode *SV = cast<ShuffleVectorSDNode>(N);		ShuffleVectorSDNode *SV = cast<ShuffleVectorSDNode>(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc dl(N);		SDLoc dl(N);

ArrayRef<int> NewMask = SV->getMask().slice(0, VT.getVectorNumElements());		ArrayRef<int> NewMask = SV->getMask().slice(0, VT.getVectorNumElements());

SDValue V0 = GetPromotedInteger(N->getOperand(0));		SDValue V0 = GetPromotedInteger(N->getOperand(0));
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	private:
void PromoteIntegerResult(SDNode *N, unsigned ResNo);		void PromoteIntegerResult(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_AssertSext(SDNode *N);		SDValue PromoteIntRes_AssertSext(SDNode *N);
SDValue PromoteIntRes_AssertZext(SDNode *N);		SDValue PromoteIntRes_AssertZext(SDNode *N);
SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);
SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);
SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);		SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);
SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);		SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);
		SDValue PromoteIntRes_INSERT_SUBVECTOR(SDNode *N);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'PromoteIntRes_INSERT_SUBVECTOR' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'PromoteIntRes_INSERT_SUBVECTOR'…
SDValue PromoteIntRes_VECTOR_REVERSE(SDNode *N);		SDValue PromoteIntRes_VECTOR_REVERSE(SDNode *N);
SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);		SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);
SDValue PromoteIntRes_VECTOR_SPLICE(SDNode *N);		SDValue PromoteIntRes_VECTOR_SPLICE(SDNode *N);
SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);		SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);
SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);		SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);
SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);		SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);
SDValue PromoteIntRes_STEP_VECTOR(SDNode *N);		SDValue PromoteIntRes_STEP_VECTOR(SDNode *N);
SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);		SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);
▲ Show 20 Lines • Show All 750 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s \| FileCheck %s --check-prefix=CHECK

				target triple = "aarch64-unknown-linux-gnu"

				; SCALABLE INSERTED INTO SCALABLE TESTS

				define <vscale x 8 x i8> @vec_scalable_subvec_scalable_idx_zero_i8(<vscale x 8 x i8>* %a, <vscale x 4 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_zero_i8:
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required in the fixed case, but that might shrink the code a little. peterwaller-arm: Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required…
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.h
				peterwaller-armUnsubmitted Not Done Reply Inline Actions I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes. peterwaller-arm: I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes.
				joechrisellisUnsubmitted Done Reply Inline Actions I tried this and `llvm/utils/update_llc_test_checks.py` doesn't spit anything out. I have hit this issue before. IIRC, the regular expression that is used by the the script to delimit the functions in the assembly codes doesn't function as expected if the CFI escapes are missing. Might submit a patch for this later if I can recall what the issue was. FWIW: $ grep -Rl 'Assertions have been' llvm/test//AArch64//* \| xargs grep -l 'asm-verbose=0' llvm/test/CodeGen/AArch64/bf16-vector-shuffle.ll ... there's only one file with autogen'd assertions that does use `-asm-verbose=0`. joechrisellis: I tried this and `llvm/utils/update_llc_test_checks.py` doesn't spit anything out. I have hit…
				paulwalker-armUnsubmitted Done Reply Inline Actions `attributes #0 = { nounwind "target-features"="+sve" }` will see the CFI entries removed. paulwalker-arm: `attributes #0 = { nounwind "target-features"="+sve" }` will see the CFI entries removed.
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1b { z1.s }, p1/z, [x1]
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1b { z1.s }, p1, [sp, #2, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				peterwaller-armUnsubmitted Done Reply Inline Actions There is a bit of extraneous stuff going on in these tests, if you choose a couple of optimization passes are you able to shrink them a bit? I'm looking at the store of x29 and extra addpl. peterwaller-arm: There is a bit of extraneous stuff going on in these tests, if you choose a couple of…
				joechrisellisUnsubmitted Done Reply Inline Actions `-O3` doesn't change the lowering here, so I am not sure if these test cases can be reduced by using different flags. joechrisellis: `-O3` doesn't change the lowering here, so I am not sure if these test cases can be reduced by…
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <vscale x 4 x i8>, <vscale x 4 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.nxv4i8(<vscale x 8 x i8> %vec, <vscale x 4 x i8> %subvec, i64 0)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 8 x i8> @vec_scalable_subvec_scalable_idx_nonzero_i8(<vscale x 8 x i8>* %a, <vscale x 4 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_nonzero_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1b { z1.s }, p1/z, [x1]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: orr x8, x8, #0x2
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1b { z1.s }, p1, [x8]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <vscale x 4 x i8>, <vscale x 4 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.nxv4i8(<vscale x 8 x i8> %vec, <vscale x 4 x i8> %subvec, i64 2)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_scalable_idx_zero_i16(<vscale x 4 x i16>* %a, <vscale x 2 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_zero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.d }, p1/z, [x1]
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z1.d }, p1, [sp, #2, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <vscale x 2 x i16>, <vscale x 2 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 0)
				ret <vscale x 4 x i16> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_scalable_idx_nonzero_i16(<vscale x 4 x i16>* %a, <vscale x 2 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_nonzero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.d }, p1/z, [x1]
				; CHECK-NEXT: cntd x9
				; CHECK-NEXT: sub x9, x9, #1 // =1
				; CHECK-NEXT: mov w8, #2
				; CHECK-NEXT: cmp x9, #2 // =2
				; CHECK-NEXT: csel x8, x9, x8, lo
				; CHECK-NEXT: addpl x9, sp, #4
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z1.d }, p1, [x9, x8, lsl #1]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <vscale x 2 x i16>, <vscale x 2 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 2)
				peterwaller-armUnsubmitted Not Done Reply Inline Actions As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems to overwrite elements {1,2}. So I am not convinced this is correct. peterwaller-arm: As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems…
				ret <vscale x 4 x i16> %ins
				}

				; FIXED INSERTED INTO SCALABLE TESTS

				define <vscale x 8 x i8> @vec_scalable_subvec_fixed_idx_zero_i8(<vscale x 8 x i8>* %a, <8 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_zero_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <8 x i8>, <8 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.v8i8(<vscale x 8 x i8> %vec, <8 x i8> %subvec, i64 0)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 8 x i8> @vec_scalable_subvec_fixed_idx_nonzero_i8(<vscale x 8 x i8>* %a, <8 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: stur d1, [x8, #2]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <8 x i8>, <8 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.v8i8(<vscale x 8 x i8> %vec, <8 x i8> %subvec, i64 2)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_fixed_idx_zero_i16(<vscale x 4 x i16>* %a, <4 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_zero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <4 x i16>, <4 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.v4i16(<vscale x 4 x i16> %vec, <4 x i16> %subvec, i64 0)
				ret <vscale x 4 x i16> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_fixed_idx_nonzero_i16(<vscale x 4 x i16>* %a, <4 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: stur d1, [x8, #4]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <4 x i16>, <4 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.v4i16(<vscale x 4 x i16> %vec, <4 x i16> %subvec, i64 2)
				ret <vscale x 4 x i16> %ins
				}

				define <vscale x 2 x i32> @vec_scalable_subvec_fixed_idx_zero_i32(<vscale x 2 x i32>* %a, <2 x i32>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_zero_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1w { z0.d }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 2 x i32>, <vscale x 2 x i32>* %a
				%subvec = load <2 x i32>, <2 x i32>* %b
				%ins = call <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v2i32(<vscale x 2 x i32> %vec, <2 x i32> %subvec, i64 0)
				ret <vscale x 2 x i32> %ins
				}

				define <vscale x 2 x i32> @vec_scalable_subvec_fixed_idx_nonzero_i32(<vscale x 2 x i32>* %a, <2 x i32>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1w { z0.d }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8, #8]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 2 x i32>, <vscale x 2 x i32>* %a
				%subvec = load <2 x i32>, <2 x i32>* %b
				%ins = call <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v2i32(<vscale x 2 x i32> %vec, <2 x i32> %subvec, i64 2)
				ret <vscale x 2 x i32> %ins
				}

				declare <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.nxv4i8(<vscale x 8 x i8>, <vscale x 4 x i8>, i64)
				declare <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16>, <vscale x 2 x i16>, i64)

				declare <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.v8i8(<vscale x 8 x i8>, <8 x i8>, i64)
				declare <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.v4i16(<vscale x 4 x i16>, <4 x i16>, i64)
				declare <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v2i32(<vscale x 2 x i32>, <2 x i32>, i64)

				attributes #0 = { "target-features"="+sve" }