This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
4/9
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
TargetLowering.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
4/7
insert-subvector-res-legalization.ll

Differential D102766

[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR
ClosedPublic

Authored by bsmith on May 19 2021, 6:01 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
peterwaller-arm
DavidTruby
efriedma

Commits

rG2668727929e4: [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR

Summary

Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.

This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.

This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.

Depends on: D102765

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

joechrisellis created this revision.May 19 2021, 6:01 AM

Herald added subscribers: ecnelises, hiraditya. · View Herald TranscriptMay 19 2021, 6:01 AM

joechrisellis requested review of this revision.May 19 2021, 6:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2021, 6:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B105215: Diff 346426.May 19 2021, 7:09 AM

Some suggestions.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4735	nit: Extraneous blank.
4748	For me, the term 'spilling' is usually associated with running out of registers and needing to create register space by spilling them to the stack. I think a comment here instead should express the intent of the code, something like "To insert SubVec into Vec, store the wider vector to memory, overwrite the lower half with the narrower vector, and reload". The other comments can probably be removed.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15	I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes.

joechrisellis marked an inline comment as done.May 20 2021, 2:18 AM

joechrisellis added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4735	This is consistent with `PromoteIntRes_EXTRACT_SUBVECTOR` some 60-ish lines above.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15	I tried this and `llvm/utils/update_llc_test_checks.py` doesn't spit anything out. I have hit this issue before. IIRC, the regular expression that is used by the the script to delimit the functions in the assembly codes doesn't function as expected if the CFI escapes are missing. Might submit a patch for this later if I can recall what the issue was. FWIW: $ grep -Rl 'Assertions have been' llvm/test//AArch64//* \| xargs grep -l 'asm-verbose=0' llvm/test/CodeGen/AArch64/bf16-vector-shuffle.ll ... there's only one file with autogen'd assertions that does use `-asm-verbose=0`.

Address review comments.

@peterwaller-arm:
- comment clarifications.

peterwaller-arm added inline comments.May 20 2021, 2:24 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4735	This is only a nit, but the majority of functions in this file don't have it and I think PromoteIntRes_EXTRACT_SUBVECTOR is in error. It's sufficiently nearby and related code that I'd be tempted to remove it from that one too to maintain local consistenc (but not anywhere else, there are other examples in this file).

Matt added a subscriber: Matt.May 20 2021, 2:46 AM

Harbormaster completed remote builds in B105380: Diff 346665.May 20 2021, 3:34 AM

peterwaller-arm added inline comments.May 20 2021, 4:08 AM

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
109	As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems to overwrite elements {1,2}. So I am not convinced this is correct.

paulwalker-arm added inline comments.May 20 2021, 10:42 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4735	Functional issues aside, I'm with @peterwaller-arm on this one.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15	`attributes #0 = { nounwind "target-features"="+sve" }` will see the CFI entries removed.

peterwaller-arm added inline comments.May 20 2021, 12:33 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4748	Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower order bits. So the comment needs adjusting.

paulwalker-arm mentioned this in D102765: [SelectionDAG] Add stub implementation of ReplaceInsertSubVectorResults.May 21 2021, 6:31 AM

joechrisellis edited the summary of this revision. (Show Details)May 21 2021, 9:14 AM

Fold in D102765 + address nits.

joechrisellis planned changes to this revision.May 21 2021, 9:43 AM

Harbormaster completed remote builds in B105662: Diff 347066.May 21 2021, 10:40 AM

Scale index by vscale.

Remove CFI entries from test.

joechrisellis marked an inline comment as done.Jun 2 2021, 3:08 AM

Harbormaster completed remote builds in B107205: Diff 349213.Jun 2 2021, 4:29 AM

peterwaller-arm added inline comments.Jun 17 2021, 3:13 AM

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
9	Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required in the fixed case, but that might shrink the code a little.
25	There is a bit of extraneous stuff going on in these tests, if you choose a couple of optimization passes are you able to shrink them a bit? I'm looking at the store of x29 and extra addpl.

peterwaller-arm added inline comments.Jun 17 2021, 1:23 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4762	Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if non-zero. This would get rid of some `rdvl ..., #0`.

Address review comments:

@peterwaller-arm:
- only scale if the index is non-zero.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4762	Good shout -- that's removed a few instructions from the test cases.
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
25	`-O3` doesn't change the lowering here, so I am not sure if these test cases can be reduced by using different flags.

Harbormaster completed remote builds in B109946: Diff 353024.Jun 19 2021, 3:22 AM

Update insertion indices. The old insertion indices will be reported as invalid
when D104468 is merged.

Harbormaster completed remote builds in B110172: Diff 353334.Jun 21 2021, 5:45 AM

Bugfixes.

Harbormaster completed remote builds in B110219: Diff 353396.Jun 21 2021, 10:55 AM

peterwaller-arm added inline comments.Jun 22 2021, 3:29 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4774	My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for what is needed here. It's necessary to ensure the upper element of the inserted fixed-width vector fits into the scalable vector. getVectorElementPointer is effectively only forcing that the first element is in bounds, but it's the last element which matters.

Bugfixes.

Harbormaster completed remote builds in B110779: Diff 354181.Jun 24 2021, 3:45 AM

joechrisellis edited the summary of this revision. (Show Details)Jun 24 2021, 9:52 AM

bsmith commandeered this revision.Jun 30 2021, 5:48 AM

bsmith edited reviewers, added: joechrisellis; removed: bsmith.

Use new getVectorSubVecPointer TLI function to ensure correct clamping
Fixup tests due to changes

Harbormaster completed remote builds in B111735: Diff 355520.Jun 30 2021, 7:01 AM

peterwaller-arm accepted this revision.Jun 30 2021, 7:25 AM

This revision is now accepted and ready to land.Jun 30 2021, 7:25 AM

This revision was landed with ongoing or failed builds.Jul 1 2021, 9:06 AM

Closed by commit rG2668727929e4: [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR (authored by bsmith). · Explain Why

This revision was automatically updated to reflect the committed changes.

bsmith added a commit: rG2668727929e4: [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR.

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:

`%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)`

and you promoted all of the scalable types to their equivalent legal types you'd end up with:

`%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)`

This ends up no longer being a valid use of vector.insert since the element types differ.

In D102766#2857778, @bsmith wrote:
In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)
and you promoted all of the scalable types to their equivalent legal types you'd end up with:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)
This ends up no longer being a valid use of vector.insert since the element types differ.

Right. My suggestion is that you promote from:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)

to:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)

This isn't legal, but we need code to handle it anyway.

In D102766#2858467, @efriedma wrote:
In D102766#2857778, @bsmith wrote:
In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)
and you promoted all of the scalable types to their equivalent legal types you'd end up with:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)
This ends up no longer being a valid use of vector.insert since the element types differ.
Right. My suggestion is that you promote from:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)
to:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)
This isn't legal, but we need code to handle it anyway.

Are you suggesting to still go through memory but to do it during operand legalization instead?

In D102766#2859581, @bsmith wrote:
In D102766#2858467, @efriedma wrote:
In D102766#2857778, @bsmith wrote:
In D102766#2853500, @efriedma wrote:

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)
and you promoted all of the scalable types to their equivalent legal types you'd end up with:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)
This ends up no longer being a valid use of vector.insert since the element types differ.
Right. My suggestion is that you promote from:
%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)
to:
%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)
This isn't legal, but we need code to handle it anyway.
Are you suggesting to still go through memory but to do it during operand legalization instead?

Yes, sort of...

In some cases, we might not end up going through memory; we currently have some custom lowering support for some special cases, and might add more cases in the future.

Are you suggesting to still go through memory but to do it during operand legalization instead?

Yes, sort of...

In some cases, we might not end up going through memory; we currently have some custom lowering support for some special cases, and might add more cases in the future.

This should be sorted out in https://reviews.llvm.org/D105624

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

LegalizeIntegerTypes.cpp

46 lines

LegalizeTypes.h

1 line

TargetLowering.cpp

12 lines

Target/

AArch64/

AArch64ISelLowering.cpp

4 lines

test/

CodeGen/

AArch64/

insert-subvector-res-legalization.ll

276 lines

Diff 355904

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	#endif
case ISD::SRL: Res = PromoteIntRes_SRL(N); break;		case ISD::SRL: Res = PromoteIntRes_SRL(N); break;
case ISD::TRUNCATE: Res = PromoteIntRes_TRUNCATE(N); break;		case ISD::TRUNCATE: Res = PromoteIntRes_TRUNCATE(N); break;
case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;		case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;
case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;		case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;
case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;		case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;

case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;		Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;
		case ISD::INSERT_SUBVECTOR:
		Res = PromoteIntRes_INSERT_SUBVECTOR(N); break;
case ISD::VECTOR_REVERSE:		case ISD::VECTOR_REVERSE:
Res = PromoteIntRes_VECTOR_REVERSE(N); break;		Res = PromoteIntRes_VECTOR_REVERSE(N); break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;		Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;
case ISD::VECTOR_SPLICE:		case ISD::VECTOR_SPLICE:
Res = PromoteIntRes_VECTOR_SPLICE(N); break;		Res = PromoteIntRes_VECTOR_SPLICE(N); break;
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;		Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;
▲ Show 20 Lines • Show All 4,616 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != OutNumElems; ++i) {
SDValue Op = DAG.getAnyExtOrTrunc(Ext, dl, NOutVTElem);		SDValue Op = DAG.getAnyExtOrTrunc(Ext, dl, NOutVTElem);
// Insert the converted element to the new vector.		// Insert the converted element to the new vector.
Ops.push_back(Op);		Ops.push_back(Op);
}		}

return DAG.getBuildVector(NOutVT, dl, Ops);		return DAG.getBuildVector(NOutVT, dl, Ops);
}		}

		SDValue DAGTypeLegalizer::PromoteIntRes_INSERT_SUBVECTOR(SDNode *N) {
		EVT OutVT = N->getValueType(0);
		peterwaller-armUnsubmitted Not Done Reply Inline Actions nit: Extraneous blank. peterwaller-arm: nit: Extraneous blank.
		joechrisellisUnsubmitted Done Reply Inline Actions This is consistent with `PromoteIntRes_EXTRACT_SUBVECTOR` some 60-ish lines above. joechrisellis: This is consistent with `PromoteIntRes_EXTRACT_SUBVECTOR` some 60-ish lines above.
		peterwaller-armUnsubmitted Not Done Reply Inline Actions This is only a nit, but the majority of functions in this file don't have it and I think PromoteIntRes_EXTRACT_SUBVECTOR is in error. It's sufficiently nearby and related code that I'd be tempted to remove it from that one too to maintain local consistenc (but not anywhere else, there are other examples in this file). peterwaller-arm: This is only a nit, but the majority of functions in this file don't have it and I think…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Functional issues aside, I'm with @peterwaller-arm on this one. paulwalker-arm: Functional issues aside, I'm with @peterwaller-arm on this one.
		EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT);
		assert(NOutVT.isVector() && "This type must be promoted to a vector type");

		SDLoc dl(N);
		SDValue Vec = N->getOperand(0);
		SDValue SubVec = N->getOperand(1);
		SDValue Idx = N->getOperand(2);

		auto *ConstantIdx = cast<ConstantSDNode>(Idx);
		unsigned IdxN = ConstantIdx->getZExtValue();

		EVT VecVT = Vec.getValueType();
		EVT SubVecVT = SubVec.getValueType();
		peterwaller-armUnsubmitted Done Reply Inline Actions For me, the term 'spilling' is usually associated with running out of registers and needing to create register space by spilling them to the stack. I think a comment here instead should express the intent of the code, something like "To insert SubVec into Vec, store the wider vector to memory, overwrite the lower half with the narrower vector, and reload". The other comments can probably be removed. peterwaller-arm: For me, the term 'spilling' is usually associated with running out of registers and needing to…
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower order bits. So the comment needs adjusting. peterwaller-arm: Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower…

		// To insert SubVec into Vec, store the wider vector to memory, overwrite the
		// appropriate bits with the narrower vector, and reload.
		Align SmallestAlign = DAG.getReducedAlign(SubVecVT, /UseABI=/false);

		SDValue StackPtr =
		DAG.CreateStackTemporary(VecVT.getStoreSize(), SmallestAlign);
		auto StackPtrVT = StackPtr->getValueType(0);
		auto &MF = DAG.getMachineFunction();
		auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
		auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);

		SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, PtrInfo,
		SmallestAlign);
		peterwaller-armUnsubmitted Done Reply Inline Actions Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if non-zero. This would get rid of some `rdvl ..., #0`. peterwaller-arm: Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if…
		joechrisellisUnsubmitted Done Reply Inline Actions Good shout -- that's removed a few instructions from the test cases. joechrisellis: Good shout -- that's removed a few instructions from the test cases.

		SDValue ScaledIdx = Idx;
		if (SubVecVT.isScalableVector() && IdxN != 0) {
		APInt IdxAPInt = cast<ConstantSDNode>(Idx)->getAPIntValue();
		ScaledIdx = DAG.getVScale(dl, StackPtrVT,
		IdxAPInt.sextOrSelf(StackPtrVT.getSizeInBits()));
		}

		SDValue SubVecPtr =
		TLI.getVectorSubVecPointer(DAG, StackPtr, VecVT, SubVecVT, ScaledIdx);
		Store = DAG.getStore(Store, dl, SubVec, SubVecPtr, PtrInfo, SmallestAlign);
		return DAG.getExtLoad(ISD::LoadExtType::EXTLOAD, dl, NOutVT, Store, StackPtr,
		peterwaller-armUnsubmitted Not Done Reply Inline Actions My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for what is needed here. It's necessary to ensure the upper element of the inserted fixed-width vector fits into the scalable vector. getVectorElementPointer is effectively only forcing that the first element is in bounds, but it's the last element which matters. peterwaller-arm: My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for…
		PtrInfo, OutVT, SmallestAlign);
		}

SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_REVERSE(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_REVERSE(SDNode *N) {
SDLoc dl(N);		SDLoc dl(N);

SDValue V0 = GetPromotedInteger(N->getOperand(0));		SDValue V0 = GetPromotedInteger(N->getOperand(0));
EVT OutVT = V0.getValueType();		EVT OutVT = V0.getValueType();

return DAG.getNode(ISD::VECTOR_REVERSE, dl, OutVT, V0);		return DAG.getNode(ISD::VECTOR_REVERSE, dl, OutVT, V0);
}		}
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	private:
void PromoteIntegerResult(SDNode *N, unsigned ResNo);		void PromoteIntegerResult(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_AssertSext(SDNode *N);		SDValue PromoteIntRes_AssertSext(SDNode *N);
SDValue PromoteIntRes_AssertZext(SDNode *N);		SDValue PromoteIntRes_AssertZext(SDNode *N);
SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);
SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);
SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);		SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);
SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);		SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);
		SDValue PromoteIntRes_INSERT_SUBVECTOR(SDNode *N);
SDValue PromoteIntRes_VECTOR_REVERSE(SDNode *N);		SDValue PromoteIntRes_VECTOR_REVERSE(SDNode *N);
SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);		SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);
SDValue PromoteIntRes_VECTOR_SPLICE(SDNode *N);		SDValue PromoteIntRes_VECTOR_SPLICE(SDNode *N);
SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);		SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);
SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);		SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);
SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);		SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);
SDValue PromoteIntRes_STEP_VECTOR(SDNode *N);		SDValue PromoteIntRes_STEP_VECTOR(SDNode *N);
SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);		SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);
▲ Show 20 Lines • Show All 751 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,831 Lines • ▼ Show 20 Lines	SDValue TargetLowering::getVectorSubVecPointer(SelectionDAG &DAG,

EVT EltVT = VecVT.getVectorElementType();		EVT EltVT = VecVT.getVectorElementType();

// Calculate the element offset and add it to the pointer.		// Calculate the element offset and add it to the pointer.
unsigned EltSize = EltVT.getFixedSizeInBits() / 8; // FIXME: should be ABI size.		unsigned EltSize = EltVT.getFixedSizeInBits() / 8; // FIXME: should be ABI size.
assert(EltSize * 8 == EltVT.getFixedSizeInBits() &&		assert(EltSize * 8 == EltVT.getFixedSizeInBits() &&
"Converting bits to bytes lost precision");		"Converting bits to bytes lost precision");

assert(SubVecVT.isFixedLengthVector() &&		// Scalable vectors don't need clamping as these are checked at compile time
SubVecVT.getVectorElementType() == EltVT &&		if (SubVecVT.isFixedLengthVector()) {
		assert(SubVecVT.getVectorElementType() == EltVT &&
"Sub-vector must be a fixed vector with matching element type");		"Sub-vector must be a fixed vector with matching element type");
Index = clampDynamicVectorIndex(DAG, Index, VecVT, dl,		Index = clampDynamicVectorIndex(DAG, Index, VecVT, dl,
SubVecVT.getVectorNumElements());		SubVecVT.getVectorNumElements());
		}

EVT IdxVT = Index.getValueType();		EVT IdxVT = Index.getValueType();

Index = DAG.getNode(ISD::MUL, dl, IdxVT, Index,		Index = DAG.getNode(ISD::MUL, dl, IdxVT, Index,
DAG.getConstant(EltSize, dl, IdxVT));		DAG.getConstant(EltSize, dl, IdxVT));
return DAG.getMemBasePlusOffset(VecPtr, Index, dl);		return DAG.getMemBasePlusOffset(VecPtr, Index, dl);
}		}

▲ Show 20 Lines • Show All 1,087 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,094 Lines • ▼ Show 20 Lines	case ISD::LOAD: {
SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, SDLoc(N), MVT::i128,		SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, SDLoc(N), MVT::i128,
Result.getValue(0), Result.getValue(1));		Result.getValue(0), Result.getValue(1));
Results.append({Pair, Result.getValue(2) /* Chain */});		Results.append({Pair, Result.getValue(2) /* Chain */});
return;		return;
}		}
case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
ReplaceExtractSubVectorResults(N, Results, DAG);		ReplaceExtractSubVectorResults(N, Results, DAG);
return;		return;
		case ISD::INSERT_SUBVECTOR:
		// Custom lowering has been requested for INSERT_SUBVECTOR -- but delegate
		// to common code for result type legalisation
		return;
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
assert((VT == MVT::i8 \|\| VT == MVT::i16) &&		assert((VT == MVT::i8 \|\| VT == MVT::i16) &&
"custom lowering for unexpected type");		"custom lowering for unexpected type");

ConstantSDNode *CN = cast<ConstantSDNode>(N->getOperand(0));		ConstantSDNode *CN = cast<ConstantSDNode>(N->getOperand(0));
Intrinsic::ID IntID = static_cast<Intrinsic::ID>(CN->getZExtValue());		Intrinsic::ID IntID = static_cast<Intrinsic::ID>(CN->getZExtValue());
switch (IntID) {		switch (IntID) {
▲ Show 20 Lines • Show All 1,384 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				; SCALABLE INSERTED INTO SCALABLE TESTS

				define <vscale x 8 x i8> @vec_scalable_subvec_scalable_idx_zero_i8(<vscale x 8 x i8>* %a, <vscale x 4 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_zero_i8:
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required in the fixed case, but that might shrink the code a little. peterwaller-arm: Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required…
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				peterwaller-armUnsubmitted Not Done Reply Inline Actions I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes. peterwaller-arm: I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes.
				joechrisellisUnsubmitted Done Reply Inline Actions I tried this and `llvm/utils/update_llc_test_checks.py` doesn't spit anything out. I have hit this issue before. IIRC, the regular expression that is used by the the script to delimit the functions in the assembly codes doesn't function as expected if the CFI escapes are missing. Might submit a patch for this later if I can recall what the issue was. FWIW: $ grep -Rl 'Assertions have been' llvm/test//AArch64//* \| xargs grep -l 'asm-verbose=0' llvm/test/CodeGen/AArch64/bf16-vector-shuffle.ll ... there's only one file with autogen'd assertions that does use `-asm-verbose=0`. joechrisellis: I tried this and `llvm/utils/update_llc_test_checks.py` doesn't spit anything out. I have hit…
				paulwalker-armUnsubmitted Done Reply Inline Actions `attributes #0 = { nounwind "target-features"="+sve" }` will see the CFI entries removed. paulwalker-arm: `attributes #0 = { nounwind "target-features"="+sve" }` will see the CFI entries removed.
				; CHECK-NEXT: ld1b { z1.s }, p1/z, [x1]
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1b { z1.s }, p1, [sp, #2, mul vl]
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <vscale x 4 x i8>, <vscale x 4 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.nxv4i8(<vscale x 8 x i8> %vec, <vscale x 4 x i8> %subvec, i64 0)
				peterwaller-armUnsubmitted Done Reply Inline Actions There is a bit of extraneous stuff going on in these tests, if you choose a couple of optimization passes are you able to shrink them a bit? I'm looking at the store of x29 and extra addpl. peterwaller-arm: There is a bit of extraneous stuff going on in these tests, if you choose a couple of…
				joechrisellisUnsubmitted Done Reply Inline Actions `-O3` doesn't change the lowering here, so I am not sure if these test cases can be reduced by using different flags. joechrisellis: `-O3` doesn't change the lowering here, so I am not sure if these test cases can be reduced by…
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 8 x i8> @vec_scalable_subvec_scalable_idx_nonzero_i8(<vscale x 8 x i8>* %a, <vscale x 4 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_nonzero_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ld1b { z1.s }, p1/z, [x1]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1b { z1.s }, p1, [x8, #1, mul vl]
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <vscale x 4 x i8>, <vscale x 4 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.nxv4i8(<vscale x 8 x i8> %vec, <vscale x 4 x i8> %subvec, i64 4)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_scalable_idx_zero_i16(<vscale x 4 x i16>* %a, <vscale x 2 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_zero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.d }, p1/z, [x1]
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z1.d }, p1, [sp, #2, mul vl]
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <vscale x 2 x i16>, <vscale x 2 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 0)
				ret <vscale x 4 x i16> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_scalable_idx_nonzero_i16(<vscale x 4 x i16>* %a, <vscale x 2 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_scalable_idx_nonzero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ld1h { z1.d }, p1/z, [x1]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z1.d }, p1, [x8, #1, mul vl]
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <vscale x 2 x i16>, <vscale x 2 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 2)
				ret <vscale x 4 x i16> %ins
				}

				; FIXED INSERTED INTO SCALABLE TESTS

				define <vscale x 8 x i8> @vec_scalable_subvec_fixed_idx_zero_i8(<vscale x 8 x i8>* %a, <8 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_zero_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8]
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				peterwaller-armUnsubmitted Not Done Reply Inline Actions As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems to overwrite elements {1,2}. So I am not convinced this is correct. peterwaller-arm: As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems…
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <8 x i8>, <8 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.v8i8(<vscale x 8 x i8> %vec, <8 x i8> %subvec, i64 0)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 8 x i8> @vec_scalable_subvec_fixed_idx_nonzero_i8(<vscale x 8 x i8>* %a, <8 x i8>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: cnth x9
				; CHECK-NEXT: addpl x10, sp, #4
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: sub x9, x9, #8 // =8
				; CHECK-NEXT: mov w8, #8
				; CHECK-NEXT: cmp x9, #8 // =8
				; CHECK-NEXT: csel x8, x9, x8, lo
				; CHECK-NEXT: st1b { z0.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: str d1, [x10, x8]
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 8 x i8>, <vscale x 8 x i8>* %a
				%subvec = load <8 x i8>, <8 x i8>* %b
				%ins = call <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.v8i8(<vscale x 8 x i8> %vec, <8 x i8> %subvec, i64 8)
				ret <vscale x 8 x i8> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_fixed_idx_zero_i16(<vscale x 4 x i16>* %a, <4 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_zero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8]
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <4 x i16>, <4 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.v4i16(<vscale x 4 x i16> %vec, <4 x i16> %subvec, i64 0)
				ret <vscale x 4 x i16> %ins
				}

				define <vscale x 4 x i16> @vec_scalable_subvec_fixed_idx_nonzero_i16(<vscale x 4 x i16>* %a, <4 x i16>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: cntw x9
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: sub x9, x9, #4 // =4
				; CHECK-NEXT: mov w8, #4
				; CHECK-NEXT: cmp x9, #4 // =4
				; CHECK-NEXT: csel x8, x9, x8, lo
				; CHECK-NEXT: addpl x9, sp, #4
				; CHECK-NEXT: lsl x8, x8, #1
				; CHECK-NEXT: st1h { z0.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: str d1, [x9, x8]
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 4 x i16>, <vscale x 4 x i16>* %a
				%subvec = load <4 x i16>, <4 x i16>* %b
				%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.v4i16(<vscale x 4 x i16> %vec, <4 x i16> %subvec, i64 4)
				ret <vscale x 4 x i16> %ins
				}

				define <vscale x 2 x i32> @vec_scalable_subvec_fixed_idx_zero_i32(<vscale x 2 x i32>* %a, <2 x i32>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_zero_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: st1w { z0.d }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: addpl x8, sp, #4
				; CHECK-NEXT: str d1, [x8]
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 2 x i32>, <vscale x 2 x i32>* %a
				%subvec = load <2 x i32>, <2 x i32>* %b
				%ins = call <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v2i32(<vscale x 2 x i32> %vec, <2 x i32> %subvec, i64 0)
				ret <vscale x 2 x i32> %ins
				}

				define <vscale x 2 x i32> @vec_scalable_subvec_fixed_idx_nonzero_i32(<vscale x 2 x i32>* %a, <2 x i32>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: cntd x9
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: sub x9, x9, #2 // =2
				; CHECK-NEXT: mov w8, #2
				; CHECK-NEXT: cmp x9, #2 // =2
				; CHECK-NEXT: csel x8, x9, x8, lo
				; CHECK-NEXT: addpl x9, sp, #4
				; CHECK-NEXT: lsl x8, x8, #2
				; CHECK-NEXT: st1w { z0.d }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: str d1, [x9, x8]
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 2 x i32>, <vscale x 2 x i32>* %a
				%subvec = load <2 x i32>, <2 x i32>* %b
				%ins = call <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v2i32(<vscale x 2 x i32> %vec, <2 x i32> %subvec, i64 2)
				ret <vscale x 2 x i32> %ins
				}

				define <vscale x 2 x i32> @vec_scalable_subvec_fixed_idx_nonzero_large_i32(<vscale x 2 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: vec_scalable_subvec_fixed_idx_nonzero_large_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: subs x8, x8, #8 // =8
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0]
				; CHECK-NEXT: ldp q1, q2, [x1]
				; CHECK-NEXT: csel x8, xzr, x8, lo
				; CHECK-NEXT: mov w9, #8
				; CHECK-NEXT: cmp x8, #8 // =8
				; CHECK-NEXT: csel x8, x8, x9, lo
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: add x8, x9, x8, lsl #2
				; CHECK-NEXT: st1w { z0.d }, p0, [sp]
				; CHECK-NEXT: stp q1, q2, [x8]
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%vec = load <vscale x 2 x i32>, <vscale x 2 x i32>* %a
				%subvec = load <8 x i32>, <8 x i32>* %b
				%ins = call <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v8i32(<vscale x 2 x i32> %vec, <8 x i32> %subvec, i64 8)
				ret <vscale x 2 x i32> %ins
				}

				declare <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.nxv4i8(<vscale x 8 x i8>, <vscale x 4 x i8>, i64)
				declare <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16>, <vscale x 2 x i16>, i64)

				declare <vscale x 8 x i8> @llvm.experimental.vector.insert.nxv8i8.v8i8(<vscale x 8 x i8>, <8 x i8>, i64)
				declare <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.v4i16(<vscale x 4 x i16>, <4 x i16>, i64)
				declare <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v2i32(<vscale x 2 x i32>, <2 x i32>, i64)

				declare <vscale x 2 x i32> @llvm.experimental.vector.insert.nxv2i32.v8i32(<vscale x 2 x i32>, <8 x i32>, i64)

				attributes #0 = { nounwind "target-features"="+sve" }