Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
efriedma
MattDevereau

Commits

rGffa62673004c: [CodeGen] Support extracting fixed-length vectors from illegal scalable vectors

Summary

For some indices we can simply extract the fixed-length subvector from the
low half of the scalable vector, for example when the index is less than the
minimum number of elements in the low half. For all other cases we can
expand the operation through the stack by storing out the vector and
reloading the fixed-length part we need.

Fixes https://github.com/llvm/llvm-project/issues/55412

Tests added here:

CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

Diff Detail

Event Timeline

david-arm created this revision.Jan 17 2022, 8:46 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald TranscriptJan 17 2022, 8:46 AM

david-arm requested review of this revision.Jan 17 2022, 8:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 8:46 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

craig.topper added a subscriber: craig.topper.Jan 17 2022, 9:15 AM

craig.topper added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3015–3016	Drop else after return
3018	Drop else after return

Harbormaster completed remote builds in B143816: Diff 400566.Jan 17 2022, 9:17 AM

Any thoughts on lowering to the stack, vs. generating "select" operations? If I'm not mistaken, at least for SVE, we should always be extract elements from exactly one half of the split. Not sure how much it matters in practice, though.

Do we need to special-case i1 vectors?

Matt added a subscriber: Matt.Jan 25 2022, 3:14 PM

Rebased and added a few more tests.

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 7:14 AM

Herald added a subscriber: alextsao1999. · View Herald Transcript

In D117499#3252585, @efriedma wrote:

Any thoughts on lowering to the stack, vs. generating "select" operations? If I'm not mistaken, at least for SVE, we should always be extract elements from exactly one half of the split. Not sure how much it matters in practice, though.

Do we need to special-case i1 vectors?

I had a look into using selects, but it doesn't work because when selecting between lo and hi parts you end up constructing a non-constant index for the hi part, i.e. IdxVal - LoNumElts (which is not a constant for scalable vectors). This leads to asserts because EXTRACT_SUBVECTOR is defined to only take constant indices.

I don't think we have to do anything special for i1 vectors because if the i1 type gets promoted (i.e. NEON), then we will end up down a different path where we extract individual elements from an illegal scalable vector, which is dealt with in SplitVecOp_EXTRACT_VECTOR_ELT. This is why in @extract_v4i1_nxv32i1_16 we end up spilling the input four times - once for each EXTRACT_VECTOR_ELT. If the i1 type is legal then presumably there will also be legal loads for them too.

david-arm added inline comments.Aug 3 2022, 7:21 AM

llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll
46	Hmm, I just realised one or two test names are wrong, i.e. this function ends with _8 (for index 8), yet the actual IR has an extract from index 16.
281	I also realised this is a bogus comment because v2i32 is legal for NEON. Probably best to just remove the comment entirely.

david-arm edited the summary of this revision. (Show Details)Aug 3 2022, 7:22 AM

Harbormaster completed remote builds in B179027: Diff 449662.Aug 3 2022, 8:18 AM

I had a look into using selects, but it doesn't work because when selecting between lo and hi parts you end up constructing a non-constant index for the hi part, i.e. IdxVal - LoNumElts (which is not a constant for scalable vectors). This leads to asserts because EXTRACT_SUBVECTOR is defined to only take constant indices.

You could handle the special case where the offset only points into the high vector if vscale is exactly 1.

Actually, more generally, if vscale is a power of two, there's exactly one value of vscale that would make the EXTRACT_SUBVECTOR point into the high vector: given a pair of index/vscale that points into the high vector, if vscale is smaller, it's UB, and if vscale is larger, it points into the low vector. So we can construct an appropriate constant. I guess we don't promise that vscale is a power of two in general, though.

If we allow for the possibility that vscale isn't a power of two, then yes, you'd need some sort of variable shuffle or load/store in general.

I don't think we have to do anything special for i1 vectors because if the i1 type gets promoted (i.e. NEON), then we will end up down a different path where we extract individual elements from an illegal scalable vector, which is dealt with in SplitVecOp_EXTRACT_VECTOR_ELT. This is why in @extract_v4i1_nxv32i1_16 we end up spilling the input four times - once for each EXTRACT_VECTOR_ELT. If the i1 type is legal then presumably there will also be legal loads for them too.

The issue would be on a target where a type like v4i1 is legal. i1 vectors are tightly packed in memory, so v1i1, v2i1, v4i1, and v8i1 are all one byte, so you can't just load the part you want.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3039	Missing alignment on the load op.

Hi @efriedma,

The issue would be on a target where a type like v4i1 is legal. i1 vectors are tightly packed in memory, so v1i1, v2i1, v4i1, and v8i1 are all one byte, so you can't just load the part you want.

But if both the result (say a v4i1) and the broken-down input (say a nxv32i1, broken down into nxv16i1) are both legal, then they will both be equally packed in the same way so this should be safe I think? Unless I've misunderstood something it sounds like the problem you're worried about is when the result is packed and the input stored on to the stack is unpacked, right? If so, I don't think I can really write a test case for this because I don't know of any targets that both support scalable vectors and have legal fixed-width predicate vectors. Perhaps I can just add an unreachable for cases like that?

Say you're trying to extract a v4i1 from an nxv8i1, with index 4. If you store the nxv8i1 to the stack, then load it as v4i1, you get the elements at index 0, since the first 8 elements of nxv8i1 are all packed into a single byte.

(Note that SVE doesn't have any native instruction that's equivalent to an nxv2i1/nxv4i1/nxv8i1 store. Currently, it doesn't come up because nothing emits such operations.)

I don't think I can really write a test case for this because I don't know of any targets that both support scalable vectors and have legal fixed-width predicate vectors.

True.

Changed code to bail out if extracting a legal fixed-width predicate subvector from a scalable vector.

Hi @efriedma, if it's ok with you I'd like to leave the spilling and reloading from the stack if that's ok? As you say, there may be situations where we can improve this with a select, but the crux of this patch is to fix a bug. If we see performance issues in future we can always revisit this?

I've also added checks for the case you described, extracting a legal fixed-width predicate subvector.

Harbormaster completed remote builds in B181233: Diff 452606.Aug 15 2022, 4:00 AM

I'm okay with skipping the select optimization for now, sure.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3030	`getTypeAction(SubVT)` is always going to be "legal", or we wouldn't be here. (We always legalize results before operands.)

Changed the i1 subvector check to report an error if the subvector has a i1 element type.

david-arm marked 4 inline comments as done.Aug 31 2022, 1:32 AM

Harbormaster completed remote builds in B184312: Diff 456891.Aug 31 2022, 2:27 AM

efriedma added inline comments.Aug 31 2022, 10:58 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3030	Please use report_fatal_error so we get a diagnostic even if assertions aren't enabled.

Changed llvm_unreachable to report_fatal_error.

david-arm marked an inline comment as done.Sep 2 2022, 6:02 AM

david-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3030	Yes, you're absolutely right. Don't know what I was thinking!

Harbormaster completed remote builds in B184807: Diff 457574.Sep 2 2022, 6:32 AM

LGTM

This revision is now accepted and ready to land.Sep 2 2022, 10:03 AM

This revision was landed with ongoing or failed builds.Sep 5 2022, 7:05 AM

Closed by commit rGffa62673004c: [CodeGen] Support extracting fixed-length vectors from illegal scalable vectors (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm marked an inline comment as done.

david-arm added a commit: rGffa62673004c: [CodeGen] Support extracting fixed-length vectors from illegal scalable vectors.

Diff 449662

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 2,993 Lines • ▼ Show 20 Lines	SDValue SecondInsertion =
DAG.getVectorIdxConstant(IdxVal + LoElts, dl));		DAG.getVectorIdxConstant(IdxVal + LoElts, dl));

return SecondInsertion;		return SecondInsertion;
}		}

SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N) {
// We know that the extracted result type is legal.		// We know that the extracted result type is legal.
EVT SubVT = N->getValueType(0);		EVT SubVT = N->getValueType(0);

SDValue Idx = N->getOperand(1);		SDValue Idx = N->getOperand(1);
SDLoc dl(N);		SDLoc dl(N);
SDValue Lo, Hi;		SDValue Lo, Hi;

if (SubVT.isScalableVector() !=
N->getOperand(0).getValueType().isScalableVector())
report_fatal_error("Extracting a fixed-length vector from an illegal "
"scalable vector is not yet supported");

GetSplitVector(N->getOperand(0), Lo, Hi);		GetSplitVector(N->getOperand(0), Lo, Hi);

uint64_t LoElts = Lo.getValueType().getVectorMinNumElements();		uint64_t LoEltsMin = Lo.getValueType().getVectorMinNumElements();
uint64_t IdxVal = cast<ConstantSDNode>(Idx)->getZExtValue();		uint64_t IdxVal = cast<ConstantSDNode>(Idx)->getZExtValue();

if (IdxVal < LoElts) {		if (IdxVal < LoEltsMin) {
assert(IdxVal + SubVT.getVectorMinNumElements() <= LoElts &&		assert(IdxVal + SubVT.getVectorMinNumElements() <= LoEltsMin &&
"Extracted subvector crosses vector split!");		"Extracted subvector crosses vector split!");
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVT, Lo, Idx);		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVT, Lo, Idx);
} else {		} else if (SubVT.isScalableVector() ==
		N->getOperand(0).getValueType().isScalableVector())
		craig.topperUnsubmitted Done Reply Inline Actions Drop else after return craig.topper: Drop else after return
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVT, Hi,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVT, Hi,
DAG.getVectorIdxConstant(IdxVal - LoElts, dl));		DAG.getVectorIdxConstant(IdxVal - LoEltsMin, dl));
		craig.topperUnsubmitted Done Reply Inline Actions Drop else after return craig.topper: Drop else after return
}
		// Spill the vector to the stack. We should use the alignment for
		// the smallest part.
		SDValue Vec = N->getOperand(0);
		EVT VecVT = Vec.getValueType();
		Align SmallestAlign = DAG.getReducedAlign(VecVT, /UseABI=/false);
		SDValue StackPtr =
		DAG.CreateStackTemporary(VecVT.getStoreSize(), SmallestAlign);
		auto &MF = DAG.getMachineFunction();
		auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
		auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);

		efriedmaUnsubmitted Done Reply Inline Actions `getTypeAction(SubVT)` is always going to be "legal", or we wouldn't be here. (We always legalize results before operands.) efriedma: `getTypeAction(SubVT)` is always going to be "legal", or we wouldn't be here. (We always…
		efriedmaUnsubmitted Done Reply Inline Actions Please use report_fatal_error so we get a diagnostic even if assertions aren't enabled. efriedma: Please use report_fatal_error so we get a diagnostic even if assertions aren't enabled.
		david-armAuthorUnsubmitted Done Reply Inline Actions Yes, you're absolutely right. Don't know what I was thinking! david-arm: Yes, you're absolutely right. Don't know what I was thinking!
		SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, PtrInfo,
		SmallestAlign);

		// Extract the subvector by loading the correct part.
		StackPtr = TLI.getVectorSubVecPointer(DAG, StackPtr, VecVT, SubVT, Idx);

		return DAG.getLoad(
		SubVT, dl, Store, StackPtr,
		MachinePointerInfo::getUnknownStack(DAG.getMachineFunction()));
		efriedmaUnsubmitted Done Reply Inline Actions Missing alignment on the load op. efriedma: Missing alignment on the load op.
}		}

SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N) {
SDValue Vec = N->getOperand(0);		SDValue Vec = N->getOperand(0);
SDValue Idx = N->getOperand(1);		SDValue Idx = N->getOperand(1);
EVT VecVT = Vec.getValueType();		EVT VecVT = Vec.getValueType();

if (isa<ConstantSDNode>(Idx)) {		if (isa<ConstantSDNode>(Idx)) {
▲ Show 20 Lines • Show All 3,670 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

	; RUN: not --crash llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s -o - 2>&1 \| FileCheck %s --check-prefix=CHECK-ERROR			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s -o - \| FileCheck %s

	; Extracting a fixed-length vector from an illegal subvector			; Extracting a legal fixed-length vector from an illegal subvector

	; CHECK-ERROR: ERROR: Extracting a fixed-length vector from an illegal scalable vector is not yet supported
	define <4 x i32> @extract_v4i32_nxv16i32_12(<vscale x 16 x i32> %arg) {			define <4 x i32> @extract_v4i32_nxv16i32_12(<vscale x 16 x i32> %arg) {
				; CHECK-LABEL: extract_v4i32_nxv16i32_12:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-4
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 32 * VG
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z3.s }, p0, [sp, #3, mul vl]
				; CHECK-NEXT: st1w { z2.s }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1w { z1.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: ldr q0, [sp, #48]
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
	%ext = call <4 x i32> @llvm.vector.extract.v4i32.nxv16i32(<vscale x 16 x i32> %arg, i64 12)			%ext = call <4 x i32> @llvm.vector.extract.v4i32.nxv16i32(<vscale x 16 x i32> %arg, i64 12)
	ret <4 x i32> %ext			ret <4 x i32> %ext
	}			}

				define <8 x i16> @extract_v8i16_nxv32i16_8(<vscale x 32 x i16> %arg) {
				; CHECK-LABEL: extract_v8i16_nxv32i16_8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 16 * VG
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: st1h { z1.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: ldr q0, [sp, #16]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <8 x i16> @llvm.vector.extract.v8i16.nxv32i16(<vscale x 32 x i16> %arg, i64 8)
				ret <8 x i16> %ext
				}

				define <4 x i16> @extract_v4i16_nxv32i16_8(<vscale x 32 x i16> %arg) {
				david-armAuthorUnsubmitted Done Reply Inline Actions Hmm, I just realised one or two test names are wrong, i.e. this function ends with _8 (for index 8), yet the actual IR has an extract from index 16. david-arm: Hmm, I just realised one or two test names are wrong, i.e. this function ends with _8 (for…
				; CHECK-LABEL: extract_v4i16_nxv32i16_8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-4
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 32 * VG
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: st1h { z3.h }, p0, [sp, #3, mul vl]
				; CHECK-NEXT: st1h { z2.h }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1h { z1.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: ldr d0, [sp, #32]
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <4 x i16> @llvm.vector.extract.v4i16.nxv32i16(<vscale x 32 x i16> %arg, i64 16)
				ret <4 x i16> %ext
				}

				; The result type gets promoted, leading to us extracting 2 elements from a nxv32i16.
				; Hence we don't end up in SplitVecOp_EXTRACT_SUBVECTOR, but in SplitVecOp_EXTRACT_VECTOR_ELT instead.
				define <2 x i16> @extract_v2i16_nxv32i16_8(<vscale x 32 x i16> %arg) {
				; CHECK-LABEL: extract_v2i16_nxv32i16_8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-8
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0xc0, 0x00, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 64 * VG
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: add x8, x8, #32
				; CHECK-NEXT: st1h { z3.h }, p0, [sp, #3, mul vl]
				; CHECK-NEXT: st1h { z2.h }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1h { z1.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z3.h }, p0, [sp, #7, mul vl]
				; CHECK-NEXT: st1h { z2.h }, p0, [sp, #6, mul vl]
				; CHECK-NEXT: st1h { z1.h }, p0, [sp, #5, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [sp, #4, mul vl]
				; CHECK-NEXT: ld1 { v0.h }[0], [x8]
				; CHECK-NEXT: addvl x8, sp, #4
				; CHECK-NEXT: add x8, x8, #34
				; CHECK-NEXT: ld1 { v0.h }[2], [x8]
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-NEXT: addvl sp, sp, #8
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <2 x i16> @llvm.vector.extract.v2i16.nxv32i16(<vscale x 32 x i16> %arg, i64 16)
				ret <2 x i16> %ext
				}

				define <2 x i64> @extract_v2i64_nxv8i64_8(<vscale x 8 x i64> %arg) {
				; CHECK-LABEL: extract_v2i64_nxv8i64_8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-4
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 32 * VG
				; CHECK-NEXT: cnth x8
				; CHECK-NEXT: mov w9, #8
				; CHECK-NEXT: sub x8, x8, #2
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: cmp x8, #8
				; CHECK-NEXT: st1d { z3.d }, p0, [sp, #3, mul vl]
				; CHECK-NEXT: csel x8, x8, x9, lo
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: lsl x8, x8, #3
				; CHECK-NEXT: st1d { z2.d }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1d { z1.d }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: ldr q0, [x9, x8]
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <2 x i64> @llvm.vector.extract.v2i64.nxv8i64(<vscale x 8 x i64> %arg, i64 8)
				ret <2 x i64> %ext
				}

				define <4 x float> @extract_v4f32_nxv16f32_12(<vscale x 16 x float> %arg) {
				; CHECK-LABEL: extract_v4f32_nxv16f32_12:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-4
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x20, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 32 * VG
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z3.s }, p0, [sp, #3, mul vl]
				; CHECK-NEXT: st1w { z2.s }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1w { z1.s }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: ldr q0, [sp, #48]
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <4 x float> @llvm.vector.extract.v4f32.nxv16f32(<vscale x 16 x float> %arg, i64 12)
				ret <4 x float> %ext
				}

				define <2 x float> @extract_v2f32_nxv16f32_2(<vscale x 16 x float> %arg) {
				; CHECK-LABEL: extract_v2f32_nxv16f32_2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <2 x float> @llvm.vector.extract.v2f32.nxv16f32(<vscale x 16 x float> %arg, i64 2)
				ret <2 x float> %ext
				}

				define <4 x i1> @extract_v4i1_nxv32i1_0(<vscale x 32 x i1> %arg) {
				; CHECK-LABEL: extract_v4i1_nxv32i1_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z1.b, p0/z, #1 // =0x1
				; CHECK-NEXT: umov w8, v1.b[1]
				; CHECK-NEXT: umov w9, v1.b[2]
				; CHECK-NEXT: mov v0.16b, v1.16b
				; CHECK-NEXT: mov v0.h[1], w8
				; CHECK-NEXT: umov w8, v1.b[3]
				; CHECK-NEXT: mov v0.h[2], w9
				; CHECK-NEXT: mov v0.h[3], w8
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-NEXT: ret
				%ext = call <4 x i1> @llvm.vector.extract.v4i1.nxv32i1(<vscale x 32 x i1> %arg, i64 0)
				ret <4 x i1> %ext
				}

				; The result type gets promoted, leading to us extracting 4 elements from a nxv32i16.
				; Hence we don't end up in SplitVecOp_EXTRACT_SUBVECTOR, but in SplitVecOp_EXTRACT_VECTOR_ELT instead.
				define <4 x i1> @extract_v4i1_nxv32i1_16(<vscale x 32 x i1> %arg) {
				; CHECK-LABEL: extract_v4i1_nxv32i1_16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-8
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0xc0, 0x00, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 64 * VG
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ptrue p2.b
				; CHECK-NEXT: add x8, x8, #16
				; CHECK-NEXT: mov z0.b, p1/z, #1 // =0x1
				; CHECK-NEXT: mov z1.b, p0/z, #1 // =0x1
				; CHECK-NEXT: st1b { z0.b }, p2, [sp, #1, mul vl]
				; CHECK-NEXT: st1b { z1.b }, p2, [sp]
				; CHECK-NEXT: st1b { z0.b }, p2, [sp, #3, mul vl]
				; CHECK-NEXT: st1b { z1.b }, p2, [sp, #2, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p2, [sp, #5, mul vl]
				; CHECK-NEXT: st1b { z1.b }, p2, [sp, #4, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p2, [sp, #7, mul vl]
				; CHECK-NEXT: st1b { z1.b }, p2, [sp, #6, mul vl]
				; CHECK-NEXT: ld1 { v0.b }[0], [x8]
				; CHECK-NEXT: addvl x8, sp, #2
				; CHECK-NEXT: add x8, x8, #17
				; CHECK-NEXT: ld1 { v0.b }[2], [x8]
				; CHECK-NEXT: addvl x8, sp, #4
				; CHECK-NEXT: add x8, x8, #18
				; CHECK-NEXT: ld1 { v0.b }[4], [x8]
				; CHECK-NEXT: addvl x8, sp, #6
				; CHECK-NEXT: add x8, x8, #19
				; CHECK-NEXT: ld1 { v0.b }[6], [x8]
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-NEXT: addvl sp, sp, #8
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <4 x i1> @llvm.vector.extract.v4i1.nxv32i1(<vscale x 32 x i1> %arg, i64 16)
				ret <4 x i1> %ext
				}

				define <4 x i1> @extract_v4i1_v32i1_16(<32 x i1> %arg) {
				; CHECK-LABEL: extract_v4i1_v32i1_16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr w8, [sp, #64]
				; CHECK-NEXT: ldr w9, [sp, #72]
				; CHECK-NEXT: fmov s0, w8
				; CHECK-NEXT: ldr w8, [sp, #80]
				; CHECK-NEXT: mov v0.h[1], w9
				; CHECK-NEXT: mov v0.h[2], w8
				; CHECK-NEXT: ldr w8, [sp, #88]
				; CHECK-NEXT: mov v0.h[3], w8
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-NEXT: ret
				%ext = call <4 x i1> @llvm.vector.extract.v4i1.v32i1(<32 x i1> %arg, i64 16)
				ret <4 x i1> %ext
				}

				; The result type gets promoted, leading to us extracting 4 elements from a nxv32i3.
				; Hence we don't end up in SplitVecOp_EXTRACT_SUBVECTOR, but in SplitVecOp_EXTRACT_VECTOR_ELT instead.
				define <4 x i3> @extract_v4i3_nxv32i3_16(<vscale x 32 x i3> %arg) {
				; CHECK-LABEL: extract_v4i3_nxv32i3_16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-8
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0xc0, 0x00, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 64 * VG
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: add x8, x8, #16
				; CHECK-NEXT: st1b { z1.b }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: st1b { z1.b }, p0, [sp, #3, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1b { z1.b }, p0, [sp, #5, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p0, [sp, #4, mul vl]
				; CHECK-NEXT: st1b { z1.b }, p0, [sp, #7, mul vl]
				; CHECK-NEXT: st1b { z0.b }, p0, [sp, #6, mul vl]
				; CHECK-NEXT: ld1 { v0.b }[0], [x8]
				; CHECK-NEXT: addvl x8, sp, #2
				; CHECK-NEXT: add x8, x8, #17
				; CHECK-NEXT: ld1 { v0.b }[2], [x8]
				; CHECK-NEXT: addvl x8, sp, #4
				; CHECK-NEXT: add x8, x8, #18
				; CHECK-NEXT: ld1 { v0.b }[4], [x8]
				; CHECK-NEXT: addvl x8, sp, #6
				; CHECK-NEXT: add x8, x8, #19
				; CHECK-NEXT: ld1 { v0.b }[6], [x8]
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-NEXT: addvl sp, sp, #8
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <4 x i3> @llvm.vector.extract.v4i3.nxv32i3(<vscale x 32 x i3> %arg, i64 16)
				ret <4 x i3> %ext
				}

				; Extracting an illegal fixed-length vector from an illegal subvector
				david-armAuthorUnsubmitted Done Reply Inline Actions I also realised this is a bogus comment because v2i32 is legal for NEON. Probably best to just remove the comment entirely. david-arm: I also realised this is a bogus comment because v2i32 is legal for NEON. Probably best to just…

				define <2 x i32> @extract_v2i32_nxv16i32_2(<vscale x 16 x i32> %arg) {
				; CHECK-LABEL: extract_v2i32_nxv16i32_2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %arg, i64 2)
				ret <2 x i32> %ext
				}

				define <4 x i64> @extract_v4i64_nxv8i64_0(<vscale x 8 x i64> %arg) {
				; CHECK-LABEL: extract_v4i64_nxv8i64_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 16 * VG
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: st1d { z1.d }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: ldr q1, [sp, #16]
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%ext = call <4 x i64> @llvm.vector.extract.v4i64.nxv8i64(<vscale x 8 x i64> %arg, i64 0)
				ret <4 x i64> %ext
				}


				declare <2 x i64> @llvm.vector.extract.v2i64.nxv8i64(<vscale x 8 x i64>, i64)
				declare <4 x i64> @llvm.vector.extract.v4i64.nxv8i64(<vscale x 8 x i64>, i64)
				declare <4 x float> @llvm.vector.extract.v4f32.nxv16f32(<vscale x 16 x float>, i64)
				declare <2 x float> @llvm.vector.extract.v2f32.nxv16f32(<vscale x 16 x float>, i64)
	declare <4 x i32> @llvm.vector.extract.v4i32.nxv16i32(<vscale x 16 x i32>, i64)			declare <4 x i32> @llvm.vector.extract.v4i32.nxv16i32(<vscale x 16 x i32>, i64)
				declare <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32>, i64)
				declare <8 x i16> @llvm.vector.extract.v8i16.nxv32i16(<vscale x 32 x i16>, i64)
				declare <4 x i16> @llvm.vector.extract.v4i16.nxv32i16(<vscale x 32 x i16>, i64)
				declare <2 x i16> @llvm.vector.extract.v2i16.nxv32i16(<vscale x 32 x i16>, i64)
				declare <4 x i1> @llvm.vector.extract.v4i1.nxv32i1(<vscale x 32 x i1>, i64)
				declare <4 x i1> @llvm.vector.extract.v4i1.v32i1(<32 x i1>, i64)
				declare <4 x i3> @llvm.vector.extract.v4i3.nxv32i3(<vscale x 32 x i3>, i64)

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Support extracting fixed-length vectors from illegal scalable vectors
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 449662

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Support extracting fixed-length vectors from illegal scalable vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 449662

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

[CodeGen] Support extracting fixed-length vectors from illegal scalable vectors
ClosedPublic