This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
6/12
TargetLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-insert-vector.ll

Differential D111633

[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
ClosedPublic

Authored by sdesmalen on Oct 12 2021, 4:59 AM.

Download Raw Diff

Details

Reviewers

CarolineConcatto
paulwalker-arm
bsmith

Commits

rGbe6c8dc765c3: [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.

Summary

When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:

nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:

orr     x8, x8, #0x4
st1h    { z0.h }, p0, [sp]
st1h    { z1.d }, p1, [x8]
ld1h    { z0.h }, p0/z, [sp]

And is changed to:

mov x8, sp
st1h { z0.h }, p0, [sp]
st1h { z1.d }, p1, [x8, #1, mul vl]
ld1h { z0.h }, p0/z, [sp]

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sdesmalen created this revision.Oct 12 2021, 4:59 AM

Herald added subscribers: ecnelises, hiraditya. · View Herald TranscriptOct 12 2021, 4:59 AM

sdesmalen requested review of this revision.Oct 12 2021, 4:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 4:59 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sdesmalen added reviewers: CarolineConcatto, paulwalker-arm, bsmith.Oct 12 2021, 4:59 AM

CarolineConcatto added inline comments.Oct 12 2021, 5:18 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7876	Can you do something like this: Index = DAG.getVScale(DL, IdxVT, APInt(IdxVT.getSizeInBits(), Index.getConstantOperandVal(0))); ? Insead of multiply the index by a scalable vector of size 1?

david-arm added a subscriber: david-arm.Oct 12 2021, 5:20 AM

david-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7873	Hi @sdesmalen, it's just a thought, but while you're in this area is it also worth clamping the index for scalable vectors too? The comment above is incorrect, because we do explicitly clamp the index in other places for scalable vectors.

sdesmalen added inline comments.Oct 12 2021, 5:23 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7873	I don't think any clamping is required, because when both the subvector and the vector being inserted into are scalable, we know at compiletime whether the vector index will exceed the size of the input vector.

david-arm added inline comments.Oct 12 2021, 5:28 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7873	Oh ok - I wonder why we do this for fixed-length vectors? I was sort of expecting the problem to be the same for both inserting fixed into fixed and inserting scalable into scalable? I was specifically worried about what we did in practice for this case: call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 10) because if vscale=1 then we're inserting beyond the end of the vector.

sdesmalen added inline comments.Oct 12 2021, 5:43 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7873	This exists for inserting/extracting fixed from scalable, where we don't know at compile-time if the fixed offset exceeds the scalable vector. The example you give here: call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 10) is always out of bounds, because `vscale10 > vscale(8-2)` for any vscale.

david-arm added inline comments.Oct 12 2021, 5:58 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7873	OK, I still wasn't sure what you meant here by checking at compile time, but I tried out a test manually and I see that the Verifier emits an error for indices that are too large. So that's fine then!

Harbormaster completed remote builds in B128334: Diff 378975.Oct 12 2021, 6:10 AM

sdesmalen added inline comments.Oct 12 2021, 7:59 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7876	I'm not sure if you can always know for sure that Index is a constant value so that's why I used an explicit multiply with vscale.

paulwalker-arm added inline comments.Oct 12 2021, 8:47 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7876	To me the rational for not using `getConstantOperandAPInt` is at odds to the rational for not clamping the index. None of what you say is wrong but there's nothing to say this function has to be used in conjunction with `ISD::INSERT_SUBVECTOR`, but the function's description does say: If \p Idx plus the size of \p SubVecVT is out of bounds the returned pointer is unspecified, but the value returned will be such that the entire subvector would be within the vector bounds. So either this function is only ever used in combination with `EXTRACT_SUBVECTOR/INSERT_SUBVECTOR`, in which case we can assume `Index` to be a constant (perhaps even changing the prototype to force this?), or this is a generic helper function and thus `Index` can be anything and must be clamped to honour the function's description.

Simplified getVectorSubVecPointer to always clamp dynamically.
For the combinations in the table below, this revision changed clampDynamicVectorIndex as follows:
- Accepts cases: A, B, C, D, E, F (C and F were added)
- It now also asserts that A and F must have valid indices (i.e. are already validated by their original operation, e.g. EXTRACT/INSERT_SUBVECTOR or EXTRACT/INSERT_VECTOR_ELT.

	  |  index  |  subvector  |  vector
	  |----------------------------------
	A |  const        fixed         fixed
	B |  const        fixed      scalable
	C |  const     scalable      scalable
	D |    var        fixed         fixed
	E |    var        fixed      scalable
	F |    var     scalable      scalable
	
	Note that the following combinations are invalid, because we don't support
	extracting a scalable vector from a fixed-width vector:
	X |  const     scalable         fixed
	X |    var     scalable         fixed

Harbormaster completed remote builds in B128873: Diff 379723.Oct 14 2021, 9:31 AM

paulwalker-arm added inline comments.Oct 18 2021, 10:15 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7824–7826	Given you're not making assumptions as to where `Idx` is coming from I don't think an assert is safe enough. Sure an asserts build will exit here but a release build could leak/corrupt data, which is a problem the user is trying to prevent, hence calling this function. Instead I think the assert should be replaced by always clamping `Idx` (see the `MaxIndex` calculation below). If the source of `Idx` is indeed an `EXTRACT_SUBVECTOR/INSERT_SUBVECTOR` then the invalid index should be asserting as part of `getNode` rather than getting this far.

Always clamp instead of asserting.

sdesmalen added inline comments.Oct 19 2021, 3:20 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7824–7826	Good point about the generated code possibly doing the wrong thing for non-assert builds. I've changed it to always clamp, and removed the assert. From looking at the uses of `getVectorSubVecPointer`, it's not really worth passing in some bool to tell whether the value is coming from an insert/extract subvector in order to assert. The nodes where this function is called for insert/extract subvectors, have nodes that must already have been checked for correctness in other places.

Harbormaster completed remote builds in B129507: Diff 380631.Oct 19 2021, 4:39 AM

paulwalker-arm accepted this revision.Oct 19 2021, 9:52 AM

paulwalker-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7875	The assert text is no longer correct.

This revision is now accepted and ready to land.Oct 19 2021, 9:52 AM

Closed by commit rGbe6c8dc765c3: [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. (authored by sdesmalen). · Explain WhyOct 20 2021, 6:05 AM

This revision was automatically updated to reflect the committed changes.

sdesmalen added a commit: rGbe6c8dc765c3: [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors..

sdesmalen marked an inline comment as done.Oct 20 2021, 6:08 AM

sdesmalen added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7875	Good spot, I've changed it! Thanks.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

9 lines

test/

CodeGen/

AArch64/

sve-insert-vector.ll

20 lines

Diff 378975

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,815 Lines • ▼ Show 20 Lines	static SDValue clampDynamicVectorIndex(SelectionDAG &DAG, SDValue Idx,

EVT IdxVT = Idx.getValueType();		EVT IdxVT = Idx.getValueType();
unsigned NElts = VecVT.getVectorMinNumElements();		unsigned NElts = VecVT.getVectorMinNumElements();
if (VecVT.isScalableVector()) {		if (VecVT.isScalableVector()) {
// If this is a constant index and we know the value plus the number of the		// If this is a constant index and we know the value plus the number of the
// elements in the subvector minus one is less than the minimum number of		// elements in the subvector minus one is less than the minimum number of
// elements then it's safe to return Idx.		// elements then it's safe to return Idx.
if (auto *IdxCst = dyn_cast<ConstantSDNode>(Idx))		if (auto *IdxCst = dyn_cast<ConstantSDNode>(Idx))
if (IdxCst->getZExtValue() + (NumSubElts - 1) < NElts)		if (IdxCst->getZExtValue() + (NumSubElts - 1) < NElts)
return Idx;		return Idx;
SDValue VS =		SDValue VS =
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Given you're not making assumptions as to where `Idx` is coming from I don't think an assert is safe enough. Sure an asserts build will exit here but a release build could leak/corrupt data, which is a problem the user is trying to prevent, hence calling this function. Instead I think the assert should be replaced by always clamping `Idx` (see the `MaxIndex` calculation below). If the source of `Idx` is indeed an `EXTRACT_SUBVECTOR/INSERT_SUBVECTOR` then the invalid index should be asserting as part of `getNode` rather than getting this far. paulwalker-arm: Given you're not making assumptions as to where `Idx` is coming from I don't think an assert is…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions Good point about the generated code possibly doing the wrong thing for non-assert builds. I've changed it to always clamp, and removed the assert. From looking at the uses of `getVectorSubVecPointer`, it's not really worth passing in some bool to tell whether the value is coming from an insert/extract subvector in order to assert. The nodes where this function is called for insert/extract subvectors, have nodes that must already have been checked for correctness in other places. sdesmalen: Good point about the generated code possibly doing the wrong thing for non-assert builds. I've…
DAG.getVScale(dl, IdxVT, APInt(IdxVT.getFixedSizeInBits(), NElts));		DAG.getVScale(dl, IdxVT, APInt(IdxVT.getFixedSizeInBits(), NElts));
unsigned SubOpcode = NumSubElts <= NElts ? ISD::SUB : ISD::USUBSAT;		unsigned SubOpcode = NumSubElts <= NElts ? ISD::SUB : ISD::USUBSAT;
SDValue Sub = DAG.getNode(SubOpcode, dl, IdxVT, VS,		SDValue Sub = DAG.getNode(SubOpcode, dl, IdxVT, VS,
DAG.getConstant(NumSubElts, dl, IdxVT));		DAG.getConstant(NumSubElts, dl, IdxVT));
return DAG.getNode(ISD::UMIN, dl, IdxVT, Idx, Sub);		return DAG.getNode(ISD::UMIN, dl, IdxVT, Idx, Sub);
}		}
if (isPowerOf2_32(NElts) && NumSubElts == 1) {		if (isPowerOf2_32(NElts) && NumSubElts == 1) {
APInt Imm = APInt::getLowBitsSet(IdxVT.getSizeInBits(), Log2_32(NElts));		APInt Imm = APInt::getLowBitsSet(IdxVT.getSizeInBits(), Log2_32(NElts));
Show All 24 Lines	SDValue TargetLowering::getVectorSubVecPointer(SelectionDAG &DAG,

EVT EltVT = VecVT.getVectorElementType();		EVT EltVT = VecVT.getVectorElementType();

// Calculate the element offset and add it to the pointer.		// Calculate the element offset and add it to the pointer.
unsigned EltSize = EltVT.getFixedSizeInBits() / 8; // FIXME: should be ABI size.		unsigned EltSize = EltVT.getFixedSizeInBits() / 8; // FIXME: should be ABI size.
assert(EltSize * 8 == EltVT.getFixedSizeInBits() &&		assert(EltSize * 8 == EltVT.getFixedSizeInBits() &&
"Converting bits to bytes lost precision");		"Converting bits to bytes lost precision");

		EVT IdxVT = Index.getValueType();

// Scalable vectors don't need clamping as these are checked at compile time		// Scalable vectors don't need clamping as these are checked at compile time
if (SubVecVT.isFixedLengthVector()) {		if (SubVecVT.isFixedLengthVector()) {
assert(SubVecVT.getVectorElementType() == EltVT &&		assert(SubVecVT.getVectorElementType() == EltVT &&
"Sub-vector must be a fixed vector with matching element type");		"Sub-vector must be a fixed vector with matching element type");
Index = clampDynamicVectorIndex(DAG, Index, VecVT, dl,		Index = clampDynamicVectorIndex(DAG, Index, VecVT, dl,
		david-armUnsubmitted Not Done Reply Inline Actions Hi @sdesmalen, it's just a thought, but while you're in this area is it also worth clamping the index for scalable vectors too? The comment above is incorrect, because we do explicitly clamp the index in other places for scalable vectors. david-arm: Hi @sdesmalen, it's just a thought, but while you're in this area is it also worth clamping the…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions I don't think any clamping is required, because when both the subvector and the vector being inserted into are scalable, we know at compiletime whether the vector index will exceed the size of the input vector. sdesmalen: I don't think any clamping is required, because when both the subvector and the vector being…
		david-armUnsubmitted Not Done Reply Inline Actions Oh ok - I wonder why we do this for fixed-length vectors? I was sort of expecting the problem to be the same for both inserting fixed into fixed and inserting scalable into scalable? I was specifically worried about what we did in practice for this case: call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 10) because if vscale=1 then we're inserting beyond the end of the vector. david-arm: Oh ok - I wonder why we do this for fixed-length vectors? I was sort of expecting the problem…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions This exists for inserting/extracting fixed from scalable, where we don't know at compile-time if the fixed offset exceeds the scalable vector. The example you give here: call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 10) is always out of bounds, because `vscale10 > vscale(8-2)` for any vscale. sdesmalen: This exists for inserting/extracting fixed from scalable, where we don't know at compile-time…
		david-armUnsubmitted Not Done Reply Inline Actions OK, I still wasn't sure what you meant here by checking at compile time, but I tried out a test manually and I see that the Verifier emits an error for indices that are too large. So that's fine then! david-arm: OK, I still wasn't sure what you meant here by checking at compile time, but I tried out a test…
SubVecVT.getVectorNumElements());		SubVecVT.getVectorNumElements());
}		} else
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The assert text is no longer correct. paulwalker-arm: The assert text is no longer correct.
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions Good spot, I've changed it! Thanks. sdesmalen: Good spot, I've changed it! Thanks.
		Index =
		CarolineConcattoUnsubmitted Not Done Reply Inline Actions Can you do something like this: Index = DAG.getVScale(DL, IdxVT, APInt(IdxVT.getSizeInBits(), Index.getConstantOperandVal(0))); ? Insead of multiply the index by a scalable vector of size 1? CarolineConcatto: Can you do something like this: ``` Index = DAG.getVScale(DL, IdxVT, APInt(IdxVT.
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions I'm not sure if you can always know for sure that Index is a constant value so that's why I used an explicit multiply with vscale. sdesmalen: I'm not sure if you can always know for sure that Index is a constant value so that's why I…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions To me the rational for not using `getConstantOperandAPInt` is at odds to the rational for not clamping the index. None of what you say is wrong but there's nothing to say this function has to be used in conjunction with `ISD::INSERT_SUBVECTOR`, but the function's description does say: If \p Idx plus the size of \p SubVecVT is out of bounds the returned pointer is unspecified, but the value returned will be such that the entire subvector would be within the vector bounds. So either this function is only ever used in combination with `EXTRACT_SUBVECTOR/INSERT_SUBVECTOR`, in which case we can assume `Index` to be a constant (perhaps even changing the prototype to force this?), or this is a generic helper function and thus `Index` can be anything and must be clamped to honour the function's description. paulwalker-arm: To me the rational for not using `getConstantOperandAPInt` is at odds to the rational for not…
EVT IdxVT = Index.getValueType();		DAG.getNode(ISD::MUL, dl, IdxVT, Index,
		DAG.getVScale(dl, IdxVT, APInt(IdxVT.getSizeInBits(), 1)));

Index = DAG.getNode(ISD::MUL, dl, IdxVT, Index,		Index = DAG.getNode(ISD::MUL, dl, IdxVT, Index,
DAG.getConstant(EltSize, dl, IdxVT));		DAG.getConstant(EltSize, dl, IdxVT));
return DAG.getMemBasePlusOffset(VecPtr, Index, dl);		return DAG.getMemBasePlusOffset(VecPtr, Index, dl);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Implementation of Emulated TLS Model		// Implementation of Emulated TLS Model
▲ Show 20 Lines • Show All 1,086 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-insert-vector.ll

	Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: uzp1 z1.s, z2.s, z1.s			; CHECK-NEXT: uzp1 z1.s, z2.s, z1.s
	; CHECK-NEXT: uunpkhi z0.s, z0.h			; CHECK-NEXT: uunpkhi z0.s, z0.h
	; CHECK-NEXT: uzp1 z0.h, z1.h, z0.h			; CHECK-NEXT: uzp1 z0.h, z1.h, z0.h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %in, i64 2)			%r = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %in, i64 2)
	ret <vscale x 8 x i16> %r			ret <vscale x 8 x i16> %r
	}			}

				; Test that the index is scaled by vscale if the subvector is scalable.
				define <vscale x 8 x half> @insert_nxv8f16_nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in) nounwind {
				; CHECK-LABEL: insert_nxv8f16_nxv2f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.d }, p1, [x8, #1, mul vl]
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%r = call <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half> %vec, <vscale x 2 x half> %in, i64 2)
				ret <vscale x 8 x half> %r
				}

	; Fixed length clamping			; Fixed length clamping

	define <vscale x 2 x i64> @insert_fixed_v2i64_nxv2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec) nounwind #0 {			define <vscale x 2 x i64> @insert_fixed_v2i64_nxv2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec) nounwind #0 {
	; CHECK-LABEL: insert_fixed_v2i64_nxv2i64:			; CHECK-LABEL: insert_fixed_v2i64_nxv2i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-1			; CHECK-NEXT: addvl sp, sp, #-1
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v4i64(<vscale x 2 x i64>, <4 x i64>, i64)			declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v4i64(<vscale x 2 x i64>, <4 x i64>, i64)

	declare <vscale x 16 x i64> @llvm.experimental.vector.insert.nxv8i64.nxv16i64(<vscale x 16 x i64>, <vscale x 8 x i64>, i64)			declare <vscale x 16 x i64> @llvm.experimental.vector.insert.nxv8i64.nxv16i64(<vscale x 16 x i64>, <vscale x 8 x i64>, i64)
	declare <vscale x 16 x i64> @llvm.experimental.vector.insert.v2i64.nxv16i64(<vscale x 16 x i64>, <2 x i64>, i64)			declare <vscale x 16 x i64> @llvm.experimental.vector.insert.v2i64.nxv16i64(<vscale x 16 x i64>, <2 x i64>, i64)
	declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv1i32(<vscale x 4 x i32>, <vscale x 1 x i32>, i64)			declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv1i32(<vscale x 4 x i32>, <vscale x 1 x i32>, i64)
	declare <vscale x 6 x i16> @llvm.experimental.vector.insert.nxv6i16.nxv1i16(<vscale x 6 x i16>, <vscale x 1 x i16>, i64)			declare <vscale x 6 x i16> @llvm.experimental.vector.insert.nxv6i16.nxv1i16(<vscale x 6 x i16>, <vscale x 1 x i16>, i64)

	declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16>, <vscale x 2 x i16>, i64)			declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.nxv2i16(<vscale x 8 x i16>, <vscale x 2 x i16>, i64)

				declare <vscale x 8 x half> @llvm.experimental.vector.insert.nxv8f16.nxv2f16(<vscale x 8 x half>, <vscale x 2 x half>, i64)