This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6654–6659	I'm not that happy with how this mask is generated. Ideally we would just `vmv.v.i v0, 0x55` directly, but I can't seem to do it in a way that keeps SelectionDAG happy. Namely, it doesn't allow bitcasting `<vscale x n x i8>` -> `<vscale x n i1>`

Harbormaster completed remote builds in B213863: Diff 497629.Feb 15 2023, 5:28 AM

The herald failures should be fixed now that https://reviews.llvm.org/D141924#inline-1391060 has been updated

These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather.

craig.topper added inline comments.Feb 15 2023, 9:02 AM

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
2	I don't think this test even uses your patch and shows the strategy for how your patch can be improved. Only i64 and double test case use vrgather.
9	Missing CHECK lines

For interleave, a couple options to consider:

Concatenate vectors into LMUL+1 vector, then gather using {0,VLMAX,1, VLMAX+1,...]. Forming that index vector takes some care for the scalable case, but I think we can do id()/2 +{masked} VLMAX where mask = {0,1,0,1}. The mask can in turn be computed as {id()&1 == 1) Using a segment store to the stack followed by a whole vector reload.

p.s. This was written before @craig.topper's comment, and I haven't read his in detail.

Here's an excerpt from the code we already have for interleave on fixed vectors. Need to see why it doesn't trigger here.

// Detect an interleave shuffle and lower to                                   
// (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1)) 
bool SwapSources;                                                              
if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) {

In D144092#4129342, @craig.topper wrote:

These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather.

That's a much better approach. But it won't work for i64 though right? In https://reviews.llvm.org/D144143 it falls back to a vrgather with a constant index vector

; RV32-V128-LABEL: unary_interleave_v4i64:
; RV32-V128:       # %bb.0:
; RV32-V128-NEXT:    lui a0, %hi(.LCPI19_0)
; RV32-V128-NEXT:    addi a0, a0, %lo(.LCPI19_0)
; RV32-V128-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
; RV32-V128-NEXT:    vle16.v v12, (a0)
; RV32-V128-NEXT:    vrgatherei16.vv v10, v8, v12
; RV32-V128-NEXT:    vmv.v.v v8, v10
; RV32-V128-NEXT:    ret

We can't load the indices for a scalable vector because it'll be of unknown size.

Rework lowering to use narrowing shifts in the deinterleave case where possible, and use a single gather that doesn't require viota for interleaving

Fix missing check lines

luke mentioned this in D144143: [RISCV] Improve isInterleaveShuffle to handle interleaving the high half and low half of the same source..Feb 16 2023, 7:25 AM

luke edited the summary of this revision. (Show Details)Feb 16 2023, 7:26 AM

luke marked 2 inline comments as done.

Harbormaster completed remote builds in B214149: Diff 498006.Feb 16 2023, 8:21 AM

Be smarter about generating masks by using generating vmsne.vi vx, 0 directly, avoiding a redundant vadd.vi
When SEW < ELEN interleave vectors with vwaddu.vv/vwmacc.vx

luke mentioned this in D144091: [RISCV][NFC] Add VIOTA_VL node.Feb 16 2023, 11:04 AM

Harbormaster completed remote builds in B214208: Diff 498085.Feb 16 2023, 1:28 PM

luke added a child revision: D144175: [RISCV] Combine (store/load interleave,deinterleave) into vsseg2/vlseg2.Feb 17 2023, 4:09 AM

Remove convertToMask

Update tests

Harbormaster completed remote builds in B214394: Diff 498343.Feb 17 2023, 7:39 AM

At a high level, I'm unhappy with the amount of duplication between the scalable path with the new nodes and the fixed vector path. I think this is a great POC - it largely convinces me we *can* lower these reasonably - but we need to rethink this a bit for landing.

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

The other approach would be to have a set of utility functions, and call them appropriately from both places.

I will note I'm open to being convinced this can be done in tree, but in that case, I'd probably want to see a more minimal lowering with incremental improvement and sharing in follow up changes.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6626	This case looks to be missing from the fixed length lowering, we should find a way to pick it up there as well.
6700	This bit of code should be shared with the isInterleaveShuffle above. This will require a bit of restructuring.
llvm/lib/Target/RISCV/RISCVISelLowering.h
273	Unrelated, commit separately please.

In D144092#4135349, @reames wrote:

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors.

I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets.
Presumably a lot of work.
And I wonder if they have the same duplication problem too.

In D144092#4135927, @luke wrote:

In D144092#4135349, @reames wrote:

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors.

I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets.
Presumably a lot of work.
And I wonder if they have the same duplication problem too.

Can we extract the code generation from LowerVECTOR_SHUFFLE into a function that we call in two places? That should reduce the duplication.

Share code between fixed length and scalable vectors

luke edited parent revisions, added: D144386: [RISCV] Use a smaller VL when interleaving fixed vectors; removed: D144091: [RISCV][NFC] Add VIOTA_VL node.Feb 20 2023, 5:39 AM

luke mentioned this in D144387: [RISCV][NFC] Make a note of the operands for RISCVISD::VNSRL_VL.Feb 20 2023, 5:41 AM

Harbormaster completed remote builds in B214739: Diff 498810.Feb 20 2023, 6:55 AM

craig.topper added inline comments.Feb 20 2023, 5:31 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6572	`Subtarget` is unused
6731	Need to pass `Idx` instead of `UNDEF` to get a mask undisturbed vadd.vv.
llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
72	This is a mask agnostic vadd.vx. You need mask undisturbed.

luke mentioned this in rG7e2f2f0fc8f1: [RISCV][NFC] Make a note of the operands for RISCVISD::VNSRL_VL.Feb 21 2023, 1:45 AM

Fix i64 interleave case not using mask undisturbed

luke marked 4 inline comments as done.Feb 21 2023, 2:31 AM

luke added inline comments.Feb 21 2023, 2:37 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6626	I think this is handled in `lowerVECTOR_SHUFFLEAsVNSRL`

Harbormaster completed remote builds in B214963: Diff 499089.Feb 21 2023, 3:47 AM

reames added inline comments.Feb 21 2023, 9:17 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6587	WideN.getNode()->getNumValues() should just be NumVals right?
6619	Please use getDefaultVLOps here.
6622	I strongly request we drop the optimized case from the initial patch and come back to these in follow up changes. I want to focus on having a correct base lowering first.
6652	The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine, we need the index type to be larger than i8.
6690	I think this is the same as WideVT above, but with different naming and computation. Can you standardize on one or the other please?
6696	Please drop the optimized case from the initial patch.
6706	See changeVectorElementType
6725	I think the comments are backwards here. I believe the value you're actually computing is: // 0,0,1,1,2,2,.. etc. This could simply be a "which order do we write vector lanes in" confusion, but this ordered doesn't match the deinterleave comment above either.
6728	Same here

reames added inline comments.Feb 21 2023, 9:58 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720	See newly added computeVLMax.

reames added inline comments.Feb 21 2023, 10:08 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6652	Realized the case I raised is unreachable until the fast path above is removed (as I suggested).

reames added inline comments.Feb 21 2023, 10:23 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6622	Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop this request. Instead, I'm going to explore a pre-change to make sharing that logic possible.
6696	Same here.

luke added inline comments.Feb 21 2023, 3:52 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6652	Come think of it this could just be replaced with `i16` and `VRGATHEREI16` instead, saving some space.

luke added inline comments.Feb 21 2023, 4:12 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720	It looks like `computeVLMAX` computes the maximum VLMAX statically, not the actual VLMAX on the current hardware. Although there's a bunch of other places where VLMAX is computed. Maybe we can rename `computeVLMAX` to `computeMaxVLMAX`, and then add a helper function `getVLMAX`

luke added inline comments.Feb 21 2023, 4:14 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6725	I've been staring at the spec too long, they use backwards notation. I agree, keeping it in LLVM order makes more sense

craig.topper added inline comments.Feb 21 2023, 4:14 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720	It's using ISD::VSCALE which will be expanded to something like `csrr a0, vlenb; srli a0, 3`. We define "vscale" as vlen/64. vlenb is already in bytes so we divide by with a shift.

craig.topper added inline comments.Feb 21 2023, 4:16 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720	Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we divide by another 8 with a shift right by 3."

luke added inline comments.Feb 21 2023, 4:39 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6652	Never mind, this causes an extra `vsetvli`

Address some (but not all) review comments

luke marked 2 inline comments as done.Feb 21 2023, 4:52 PM

luke added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6720	My bad, I didn't realise how new it actually was and hadn't pulled in 9168c98553ac9a1f8e8b87006f9b1b3f23955beb, I thought you were referring to `RISCVTargetLowering::computeVLMAX`

Use computeVLMax

luke marked 2 inline comments as done.Feb 21 2023, 5:01 PM

Use computeVLMax

LGTM with two test comments addressed.

llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll
40	Missing check lines here, probably due to conflict with autogen.
117	Same problem here.

This revision is now accepted and ready to land.Feb 21 2023, 6:05 PM

Harbormaster completed remote builds in B215132: Diff 499327.Feb 21 2023, 6:28 PM

luke mentioned this in D144532: [RISCV] Reorganize deinterleave lowering for reuse [nfc].Feb 22 2023, 3:05 AM

Fix check lines

luke marked 2 inline comments as done.Feb 22 2023, 9:39 AM

craig.topper added inline comments.Feb 22 2023, 9:43 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6589	`operator->` works on SDValue. You don't need to call getNode.

luke added a parent revision: D144532: [RISCV] Reorganize deinterleave lowering for reuse [nfc].Feb 22 2023, 9:53 AM

@reames getDeinterleaveViaVNSRL needs some massaging to be reused here with the scalable vectors, I'll do that in a follow up patch to keep this diff clean

Use -> overload

luke marked 8 inline comments as done.Feb 22 2023, 10:06 AM

Remove accidental extra commit

luke added a child revision: D144584: [RISCV][NFC] Reuse getDeinterleaveViaVNSRL to lower deinterleave intrinsics.Feb 22 2023, 11:32 AM

Harbormaster completed remote builds in B215309: Diff 499571.Feb 22 2023, 1:17 PM

Reverse ping - anything preventing this from landing?

In D144092#4147491, @reames wrote:

Reverse ping - anything preventing this from landing?

Was intending to land it with https://reviews.llvm.org/D144584

Rebase

This revision was landed with ongoing or failed builds.Feb 23 2023, 8:23 AM

Closed by commit rG8d15e7275fe1: [RISCV] Lower interleave and deinterleave intrinsics (authored by luke). · Explain Why

This revision was automatically updated to reflect the committed changes.

luke added a commit: rG8d15e7275fe1: [RISCV] Lower interleave and deinterleave intrinsics.

Harbormaster completed remote builds in B215526: Diff 499863.Feb 23 2023, 9:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.h

2 lines

RISCVISelLowering.cpp

190 lines

test/

CodeGen/

RISCV/

rvv/

vector-deinterleave-fixed.ll

405 lines

vector-deinterleave.ll

182 lines

vector-interleave-fixed.ll

437 lines

vector-interleave.ll

187 lines

Diff 499864

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
VWADDU_VL,		VWADDU_VL,
VWSUB_VL,		VWSUB_VL,
VWSUBU_VL,		VWSUBU_VL,
VWADD_W_VL,		VWADD_W_VL,
VWADDU_W_VL,		VWADDU_W_VL,
VWSUB_W_VL,		VWSUB_W_VL,
VWSUBU_W_VL,		VWSUBU_W_VL,

// Narrowing logical shift right.		// Narrowing logical shift right.
		reamesUnsubmitted Done Reply Inline Actions Unrelated, commit separately please. reames: Unrelated, commit separately please.
// Operands are (source, shift, passthru, mask, vl)		// Operands are (source, shift, passthru, mask, vl)
VNSRL_VL,		VNSRL_VL,

// Vector compare producing a mask. Fourth operand is input mask. Fifth		// Vector compare producing a mask. Fourth operand is input mask. Fifth
// operand is VL.		// operand is VL.
SETCC_VL,		SETCC_VL,

// Vector select with an additional VL operand. This operation is unmasked.		// Vector select with an additional VL operand. This operation is unmasked.
▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	private:
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVPREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVectorMaskVecReduction(SDValue Op, SelectionDAG &DAG,		SDValue lowerVectorMaskVecReduction(SDValue Op, SelectionDAG &DAG,
bool IsVP) const;		bool IsVP) const;
SDValue lowerFPVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFPVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
		SDValue lowerVECTOR_DEINTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
		SDValue lowerVECTOR_INTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerSTEP_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerSTEP_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVECTOR_REVERSE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECTOR_REVERSE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerABS(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerABS(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerMaskedLoad(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerMaskedLoad(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerMaskedStore(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerMaskedStore(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFixedLengthVectorFCOPYSIGNToRVV(SDValue Op,		SDValue lowerFixedLengthVectorFCOPYSIGNToRVV(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines	for (MVT VT : BoolVecVTs) {
setTruncStoreAction(OtherVT, VT, Expand);		setTruncStoreAction(OtherVT, VT, Expand);
setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,		setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,
VT, Expand);		VT, Expand);
}		}

setOperationAction({ISD::VP_FP_TO_SINT, ISD::VP_FP_TO_UINT,		setOperationAction({ISD::VP_FP_TO_SINT, ISD::VP_FP_TO_UINT,
ISD::VP_TRUNCATE, ISD::VP_SETCC},		ISD::VP_TRUNCATE, ISD::VP_SETCC},
VT, Custom);		VT, Custom);

		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

setOperationAction(ISD::VECTOR_REVERSE, VT, Custom);		setOperationAction(ISD::VECTOR_REVERSE, VT, Custom);

setOperationPromotedToType(		setOperationPromotedToType(
ISD::VECTOR_SPLICE, VT,		ISD::VECTOR_SPLICE, VT,
MVT::getVectorVT(MVT::i8, VT.getVectorElementCount()));		MVT::getVectorVT(MVT::i8, VT.getVectorElementCount()));
}		}

for (MVT VT : IntVecVTs) {		for (MVT VT : IntVecVTs) {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (MVT VT : IntVecVTs) {
setOperationAction({ISD::STEP_VECTOR, ISD::VECTOR_REVERSE}, VT, Custom);		setOperationAction({ISD::STEP_VECTOR, ISD::VECTOR_REVERSE}, VT, Custom);

for (MVT OtherVT : MVT::integer_scalable_vector_valuetypes()) {		for (MVT OtherVT : MVT::integer_scalable_vector_valuetypes()) {
setTruncStoreAction(VT, OtherVT, Expand);		setTruncStoreAction(VT, OtherVT, Expand);
setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,		setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,
VT, Expand);		VT, Expand);
}		}

		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

// Splice		// Splice
setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);		setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);

// Lower CTLZ_ZERO_UNDEF and CTTZ_ZERO_UNDEF if element of VT in the range		// Lower CTLZ_ZERO_UNDEF and CTTZ_ZERO_UNDEF if element of VT in the range
// of f32.		// of f32.
EVT FloatVT = MVT::getVectorVT(MVT::f32, VT.getVectorElementCount());		EVT FloatVT = MVT::getVectorVT(MVT::f32, VT.getVectorElementCount());
if (isTypeLegal(FloatVT)) {		if (isTypeLegal(FloatVT)) {
setOperationAction(		setOperationAction(
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	const auto SetCommonVFPActions = [&](MVT VT) {

setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SELECT_CC, VT, Expand);		setOperationAction(ISD::SELECT_CC, VT, Expand);

setOperationAction(		setOperationAction(
{ISD::CONCAT_VECTORS, ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},		{ISD::CONCAT_VECTORS, ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},
VT, Custom);		VT, Custom);

		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

setOperationAction({ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE}, VT, Custom);		setOperationAction({ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE}, VT, Custom);

setOperationAction(FloatingPointVPOps, VT, Custom);		setOperationAction(FloatingPointVPOps, VT, Custom);
};		};

// Sets common extload/truncstore actions on RVV floating-point vector		// Sets common extload/truncstore actions on RVV floating-point vector
// types.		// types.
const auto SetCommonVFPExtLoadTruncStoreActions =		const auto SetCommonVFPExtLoadTruncStoreActions =
▲ Show 20 Lines • Show All 3,362 Lines • ▼ Show 20 Lines	SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
case ISD::VP_REDUCE_XOR:		case ISD::VP_REDUCE_XOR:
if (Op.getOperand(1).getValueType().getVectorElementType() == MVT::i1)		if (Op.getOperand(1).getValueType().getVectorElementType() == MVT::i1)
return lowerVectorMaskVecReduction(Op, DAG, /IsVP/ true);		return lowerVectorMaskVecReduction(Op, DAG, /IsVP/ true);
return lowerVPREDUCE(Op, DAG);		return lowerVPREDUCE(Op, DAG);
case ISD::INSERT_SUBVECTOR:		case ISD::INSERT_SUBVECTOR:
return lowerINSERT_SUBVECTOR(Op, DAG);		return lowerINSERT_SUBVECTOR(Op, DAG);
case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
return lowerEXTRACT_SUBVECTOR(Op, DAG);		return lowerEXTRACT_SUBVECTOR(Op, DAG);
		case ISD::VECTOR_DEINTERLEAVE:
		return lowerVECTOR_DEINTERLEAVE(Op, DAG);
		case ISD::VECTOR_INTERLEAVE:
		return lowerVECTOR_INTERLEAVE(Op, DAG);
case ISD::STEP_VECTOR:		case ISD::STEP_VECTOR:
return lowerSTEP_VECTOR(Op, DAG);		return lowerSTEP_VECTOR(Op, DAG);
case ISD::VECTOR_REVERSE:		case ISD::VECTOR_REVERSE:
return lowerVECTOR_REVERSE(Op, DAG);		return lowerVECTOR_REVERSE(Op, DAG);
case ISD::VECTOR_SPLICE:		case ISD::VECTOR_SPLICE:
return lowerVECTOR_SPLICE(Op, DAG);		return lowerVECTOR_SPLICE(Op, DAG);
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return lowerBUILD_VECTOR(Op, DAG, Subtarget);		return lowerBUILD_VECTOR(Op, DAG, Subtarget);
▲ Show 20 Lines • Show All 2,370 Lines • ▼ Show 20 Lines	SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
Slidedown = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVecVT, Slidedown,		Slidedown = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVecVT, Slidedown,
DAG.getConstant(0, DL, XLenVT));		DAG.getConstant(0, DL, XLenVT));

// We might have bitcast from a mask type: cast back to the original type if		// We might have bitcast from a mask type: cast back to the original type if
// required.		// required.
return DAG.getBitcast(Op.getSimpleValueType(), Slidedown);		return DAG.getBitcast(Op.getSimpleValueType(), Slidedown);
}		}

		// Widen a vector's operands to i8, then truncate its results back to the
		// original type, typically i1. All operand and result types must be the same.
		static SDValue widenVectorOpsToi8(SDValue N, SDLoc &DL, SelectionDAG &DAG) {
		MVT VT = N.getSimpleValueType();
		craig.topperUnsubmitted Done Reply Inline Actions `Subtarget` is unused craig.topper: `Subtarget` is unused
		MVT WideVT = VT.changeVectorElementType(MVT::i8);
		SmallVector<SDValue, 4> WideOps;
		for (SDValue Op : N->ops()) {
		assert(Op.getSimpleValueType() == VT &&
		"Operands and result must be same type");
		WideOps.push_back(DAG.getNode(ISD::ZERO_EXTEND, DL, WideVT, Op));
		}

		unsigned NumVals = N->getNumValues();

		SDVTList VTs = DAG.getVTList(SmallVector<EVT, 4>(
		NumVals, N.getValueType().changeVectorElementType(MVT::i8)));
		SDValue WideN = DAG.getNode(N.getOpcode(), DL, VTs, WideOps);
		SmallVector<SDValue, 4> TruncVals;
		for (unsigned I = 0; I < NumVals; I++) {
		reamesUnsubmitted Done Reply Inline Actions WideN.getNode()->getNumValues() should just be NumVals right? reames: WideN.getNode()->getNumValues() should just be NumVals right?
		TruncVals.push_back(DAG.getNode(ISD::TRUNCATE, DL,
		N->getSimpleValueType(I),
		craig.topperUnsubmitted Done Reply Inline Actions `operator->` works on SDValue. You don't need to call getNode. craig.topper: `operator->` works on SDValue. You don't need to call getNode.
		SDValue(WideN.getNode(), I)));
		}

		if (TruncVals.size() > 1)
		return DAG.getMergeValues(TruncVals, DL);
		return TruncVals.front();
		}

		SDValue RISCVTargetLowering::lowerVECTOR_DEINTERLEAVE(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		MVT VecVT = Op.getSimpleValueType();
		MVT XLenVT = Subtarget.getXLenVT();

		assert(VecVT.isScalableVector() &&
		"vector_interleave on non-scalable vector!");

		// 1 bit element vectors need to be widened to e8
		if (VecVT.getVectorElementType() == MVT::i1)
		return widenVectorOpsToi8(Op, DL, DAG);

		// Concatenate the two vectors as one vector to deinterleave
		MVT ConcatVT =
		MVT::getVectorVT(VecVT.getVectorElementType(),
		VecVT.getVectorElementCount().multiplyCoefficientBy(2));
		SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, ConcatVT,
		Op.getOperand(0), Op.getOperand(1));

		// We want to operate on all lanes, so get the mask and VL and mask for it
		auto [Mask, VL] = getDefaultScalableVLOps(ConcatVT, DL, DAG, Subtarget);
		reamesUnsubmitted Done Reply Inline Actions Please use getDefaultVLOps here. reames: Please use getDefaultVLOps here.
		SDValue Passthru = DAG.getUNDEF(ConcatVT);

		// If the element type is smaller than ELEN, then we can deinterleave
		reamesUnsubmitted Done Reply Inline Actions I strongly request we drop the optimized case from the initial patch and come back to these in follow up changes. I want to focus on having a correct base lowering first. reames: I strongly request we drop the optimized case from the initial patch and come back to these in…
		reamesUnsubmitted Done Reply Inline Actions Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop this request. Instead, I'm going to explore a pre-change to make sharing that logic possible. reames: Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop…
		// through vnsrl.wi
		if (VecVT.getScalarSizeInBits() < Subtarget.getELEN()) {
		// Bitcast the concatenated vector from <n x m x ty> -> <n x m / 2 x ty * 2>
		// This is also casts FPs to ints
		reamesUnsubmitted Done Reply Inline Actions This case looks to be missing from the fixed length lowering, we should find a way to pick it up there as well. reames: This case looks to be missing from the fixed length lowering, we should find a way to pick it…
		lukeAuthorUnsubmitted Done Reply Inline Actions I think this is handled in `lowerVECTOR_SHUFFLEAsVNSRL` luke: I think this is handled in `lowerVECTOR_SHUFFLEAsVNSRL`
		MVT WideVT = MVT::getVectorVT(
		MVT::getIntegerVT(ConcatVT.getScalarSizeInBits() * 2),
		ConcatVT.getVectorElementCount().divideCoefficientBy(2));
		SDValue Wide = DAG.getBitcast(WideVT, Concat);

		MVT NarrowVT = VecVT.changeVectorElementTypeToInteger();
		SDValue Passthru = DAG.getUNDEF(VecVT);

		SDValue Even = DAG.getNode(
		RISCVISD::VNSRL_VL, DL, NarrowVT, Wide,
		DAG.getSplatVector(NarrowVT, DL, DAG.getConstant(0, DL, XLenVT)),
		Passthru, Mask, VL);
		SDValue Odd = DAG.getNode(
		RISCVISD::VNSRL_VL, DL, NarrowVT, Wide,
		DAG.getSplatVector(
		NarrowVT, DL,
		DAG.getConstant(VecVT.getScalarSizeInBits(), DL, XLenVT)),
		Passthru, Mask, VL);

		// Bitcast the results back in case it was casted from an FP vector
		return DAG.getMergeValues(
		{DAG.getBitcast(VecVT, Even), DAG.getBitcast(VecVT, Odd)}, DL);
		}

		// For the indices, use the same SEW to avoid an extra vsetvli
		MVT IdxVT = ConcatVT.changeVectorElementTypeToInteger();
		reamesUnsubmitted Done Reply Inline Actions The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine, we need the index type to be larger than i8. reames: The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine…
		reamesUnsubmitted Done Reply Inline Actions Realized the case I raised is unreachable until the fast path above is removed (as I suggested). reames: Realized the case I raised is unreachable until the fast path above is removed (as I suggested).
		lukeAuthorUnsubmitted Done Reply Inline Actions Come think of it this could just be replaced with `i16` and `VRGATHEREI16` instead, saving some space. luke: Come think of it this could just be replaced with `i16` and `VRGATHEREI16` instead, saving some…
		lukeAuthorUnsubmitted Done Reply Inline Actions Never mind, this causes an extra `vsetvli` luke: Never mind, this causes an extra `vsetvli`
		// Create a vector of even indices {0, 2, 4, ...}
		SDValue EvenIdx =
		DAG.getStepVector(DL, IdxVT, APInt(IdxVT.getScalarSizeInBits(), 2));
		// Create a vector of odd indices {1, 3, 5, ... }
		SDValue OddIdx =
		DAG.getNode(ISD::ADD, DL, IdxVT, EvenIdx, DAG.getConstant(1, DL, IdxVT));

		lukeAuthorUnsubmitted Done Reply Inline Actions I'm not that happy with how this mask is generated. Ideally we would just `vmv.v.i v0, 0x55` directly, but I can't seem to do it in a way that keeps SelectionDAG happy. Namely, it doesn't allow bitcasting `<vscale x n x i8>` -> `<vscale x n i1>` luke: I'm not that happy with how this mask is generated. Ideally we would just `vmv.v.i v0, 0x55`…
		// Gather the even and odd elements into two separate vectors
		SDValue EvenWide = DAG.getNode(RISCVISD::VRGATHER_VV_VL, DL, ConcatVT,
		Concat, EvenIdx, Passthru, Mask, VL);
		SDValue OddWide = DAG.getNode(RISCVISD::VRGATHER_VV_VL, DL, ConcatVT,
		Concat, OddIdx, Passthru, Mask, VL);

		// Extract the result half of the gather for even and odd
		SDValue Even = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VecVT, EvenWide,
		DAG.getConstant(0, DL, XLenVT));
		SDValue Odd = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VecVT, OddWide,
		DAG.getConstant(0, DL, XLenVT));

		return DAG.getMergeValues({Even, Odd}, DL);
		}

		SDValue RISCVTargetLowering::lowerVECTOR_INTERLEAVE(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		MVT VecVT = Op.getSimpleValueType();

		assert(VecVT.isScalableVector() &&
		"vector_interleave on non-scalable vector!");

		// i1 vectors need to be widened to i8
		if (VecVT.getVectorElementType() == MVT::i1)
		return widenVectorOpsToi8(Op, DL, DAG);

		MVT XLenVT = Subtarget.getXLenVT();
		SDValue VL = DAG.getRegister(RISCV::X0, XLenVT);

		SDValue Interleaved;
		reamesUnsubmitted Done Reply Inline Actions I think this is the same as WideVT above, but with different naming and computation. Can you standardize on one or the other please? reames: I think this is the same as WideVT above, but with different naming and computation. Can you…

		// If the element type is smaller than ELEN, then we can interleave with
		// vwaddu.vv and vwmaccu.vx
		if (VecVT.getScalarSizeInBits() < Subtarget.getELEN()) {
		Interleaved = getWideningInterleave(Op.getOperand(0), Op.getOperand(1), DL,
		DAG, Subtarget);
		reamesUnsubmitted Done Reply Inline Actions Please drop the optimized case from the initial patch. reames: Please drop the optimized case from the initial patch.
		reamesUnsubmitted Done Reply Inline Actions Same here. reames: Same here.
		} else {
		// Otherwise, fallback to using vrgathere16.vv
		MVT ConcatVT =
		MVT::getVectorVT(VecVT.getVectorElementType(),
		reamesUnsubmitted Done Reply Inline Actions This bit of code should be shared with the isInterleaveShuffle above. This will require a bit of restructuring. reames: This bit of code should be shared with the isInterleaveShuffle above. This will require a bit…
		VecVT.getVectorElementCount().multiplyCoefficientBy(2));
		SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, ConcatVT,
		Op.getOperand(0), Op.getOperand(1));

		MVT IdxVT = ConcatVT.changeVectorElementType(MVT::i16);

		reamesUnsubmitted Done Reply Inline Actions See changeVectorElementType reames: See changeVectorElementType
		// 0 1 2 3 4 5 6 7 ...
		SDValue StepVec = DAG.getStepVector(DL, IdxVT);

		// 1 1 1 1 1 1 1 1 ...
		SDValue Ones = DAG.getSplatVector(IdxVT, DL, DAG.getConstant(1, DL, XLenVT));

		// 1 0 1 0 1 0 1 0 ...
		SDValue OddMask = DAG.getNode(ISD::AND, DL, IdxVT, StepVec, Ones);
		OddMask = DAG.getSetCC(
		DL, IdxVT.changeVectorElementType(MVT::i1), OddMask,
		DAG.getSplatVector(IdxVT, DL, DAG.getConstant(0, DL, XLenVT)),
		ISD::CondCode::SETNE);

		SDValue VLMax = DAG.getSplatVector(IdxVT, DL, computeVLMax(VecVT, DL, DAG));
		reamesUnsubmitted Done Reply Inline Actions See newly added computeVLMax. reames: See newly added computeVLMax.
		lukeAuthorUnsubmitted Done Reply Inline Actions It looks like `computeVLMAX` computes the maximum VLMAX statically, not the actual VLMAX on the current hardware. Although there's a bunch of other places where VLMAX is computed. Maybe we can rename `computeVLMAX` to `computeMaxVLMAX`, and then add a helper function `getVLMAX` luke: It looks like `computeVLMAX` computes the maximum VLMAX statically, not the actual VLMAX on the…
		craig.topperUnsubmitted Done Reply Inline Actions It's using ISD::VSCALE which will be expanded to something like `csrr a0, vlenb; srli a0, 3`. We define "vscale" as vlen/64. vlenb is already in bytes so we divide by with a shift. craig.topper: It's using ISD::VSCALE which will be expanded to something like `csrr a0, vlenb; srli a0, 3`.
		craig.topperUnsubmitted Done Reply Inline Actions Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we divide by another 8 with a shift right by 3." craig.topper: Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we…
		lukeAuthorUnsubmitted Done Reply Inline Actions My bad, I didn't realise how new it actually was and hadn't pulled in 9168c98553ac9a1f8e8b87006f9b1b3f23955beb, I thought you were referring to `RISCVTargetLowering::computeVLMAX` luke: My bad, I didn't realise how new it actually was and hadn't pulled in…

		// Build up the index vector for interleaving the concatenated vector
		// 0 0 1 1 2 2 3 3 ...
		SDValue Idx = DAG.getNode(ISD::SRL, DL, IdxVT, StepVec, Ones);
		// 0 n 1 n+1 2 n+2 3 n+3 ...
		reamesUnsubmitted Done Reply Inline Actions I think the comments are backwards here. I believe the value you're actually computing is: // 0,0,1,1,2,2,.. etc. This could simply be a "which order do we write vector lanes in" confusion, but this ordered doesn't match the deinterleave comment above either. reames: I think the comments are backwards here. I believe the value you're actually computing is: //…
		lukeAuthorUnsubmitted Done Reply Inline Actions I've been staring at the spec too long, they use backwards notation. I agree, keeping it in LLVM order makes more sense luke: I've been staring at the spec too long, they use backwards notation. I agree, keeping it in…
		Idx =
		DAG.getNode(RISCVISD::ADD_VL, DL, IdxVT, Idx, VLMax, Idx, OddMask, VL);

		reamesUnsubmitted Done Reply Inline Actions Same here reames: Same here
		// Then perform the interleave
		// v[0] v[n] v[1] v[n+1] v[2] v[n+2] v[3] v[n+3] ...
		Interleaved = DAG.getNode(RISCVISD::VRGATHEREI16_VV_VL, DL, ConcatVT,
		craig.topperUnsubmitted Done Reply Inline Actions Need to pass `Idx` instead of `UNDEF` to get a mask undisturbed vadd.vv. craig.topper: Need to pass `Idx` instead of `UNDEF` to get a mask undisturbed vadd.vv.
		Concat, Idx, DAG.getUNDEF(ConcatVT), OddMask, VL);
		}

		// Extract the two halves from the interleaved result
		SDValue Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VecVT, Interleaved,
		DAG.getVectorIdxConstant(0, DL));
		SDValue Hi = DAG.getNode(
		ISD::EXTRACT_SUBVECTOR, DL, VecVT, Interleaved,
		DAG.getVectorIdxConstant(VecVT.getVectorMinNumElements(), DL));

		return DAG.getMergeValues({Lo, Hi}, DL);
		}

// Lower step_vector to the vid instruction. Any non-identity step value must		// Lower step_vector to the vid instruction. Any non-identity step value must
// be accounted for my manual expansion.		// be accounted for my manual expansion.
SDValue RISCVTargetLowering::lowerSTEP_VECTOR(SDValue Op,		SDValue RISCVTargetLowering::lowerSTEP_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
assert(VT.isScalableVector() && "Expected scalable vector");		assert(VT.isScalableVector() && "Expected scalable vector");
MVT XLenVT = Subtarget.getXLenVT();		MVT XLenVT = Subtarget.getXLenVT();
▲ Show 20 Lines • Show All 7,976 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck --check-prefixes=CHECK,RV32 %s
				craig.topperUnsubmitted Done Reply Inline Actions I don't think this test even uses your patch and shows the strategy for how your patch can be improved. Only i64 and double test case use vrgather. craig.topper: I don't think this test even uses your patch and shows the strategy for how your patch can be…
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck --check-prefixes=CHECK,RV64 %s

				; Integers

				define {<16 x i1>, <16 x i1>} @vector_deinterleave_v16i1_v32i1(<32 x i1> %vec) {
				; RV32-LABEL: vector_deinterleave_v16i1_v32i1:
				; RV32: # %bb.0:
				craig.topperUnsubmitted Done Reply Inline Actions Missing CHECK lines craig.topper: Missing CHECK lines
				; RV32-NEXT: addi sp, sp, -32
				; RV32-NEXT: .cfi_def_cfa_offset 32
				; RV32-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV32-NEXT: vfirst.m a0, v0
				; RV32-NEXT: seqz a0, a0
				; RV32-NEXT: sb a0, 16(sp)
				; RV32-NEXT: vsetivli zero, 0, e16, mf4, ta, ma
				; RV32-NEXT: vmv.x.s a0, v0
				; RV32-NEXT: slli a1, a0, 17
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 23(sp)
				; RV32-NEXT: slli a1, a0, 19
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 22(sp)
				; RV32-NEXT: slli a1, a0, 21
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 21(sp)
				; RV32-NEXT: slli a1, a0, 23
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 20(sp)
				; RV32-NEXT: slli a1, a0, 25
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 19(sp)
				; RV32-NEXT: slli a1, a0, 27
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 18(sp)
				; RV32-NEXT: slli a1, a0, 29
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 17(sp)
				; RV32-NEXT: vsetivli zero, 2, e8, mf4, ta, ma
				; RV32-NEXT: vslidedown.vi v8, v0, 2
				; RV32-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV32-NEXT: vfirst.m a1, v8
				; RV32-NEXT: seqz a1, a1
				; RV32-NEXT: sb a1, 24(sp)
				; RV32-NEXT: vsetivli zero, 0, e16, mf4, ta, ma
				; RV32-NEXT: vmv.x.s a1, v8
				; RV32-NEXT: slli a2, a1, 17
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 31(sp)
				; RV32-NEXT: slli a2, a1, 19
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 30(sp)
				; RV32-NEXT: slli a2, a1, 21
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 29(sp)
				; RV32-NEXT: slli a2, a1, 23
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 28(sp)
				; RV32-NEXT: slli a2, a1, 25
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 27(sp)
				; RV32-NEXT: slli a2, a1, 27
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 26(sp)
				; RV32-NEXT: slli a2, a1, 29
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 25(sp)
				; RV32-NEXT: slli a2, a0, 16
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 7(sp)
				; RV32-NEXT: slli a2, a0, 18
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 6(sp)
				; RV32-NEXT: slli a2, a0, 20
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 5(sp)
				; RV32-NEXT: slli a2, a0, 22
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 4(sp)
				; RV32-NEXT: slli a2, a0, 24
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 3(sp)
				; RV32-NEXT: slli a2, a0, 26
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 2(sp)
				; RV32-NEXT: slli a2, a0, 28
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 1(sp)
				; RV32-NEXT: slli a0, a0, 30
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 0(sp)
				; RV32-NEXT: slli a0, a1, 16
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 15(sp)
				; RV32-NEXT: slli a0, a1, 18
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 14(sp)
				; RV32-NEXT: slli a0, a1, 20
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 13(sp)
				; RV32-NEXT: slli a0, a1, 22
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 12(sp)
				; RV32-NEXT: slli a0, a1, 24
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 11(sp)
				; RV32-NEXT: slli a0, a1, 26
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 10(sp)
				; RV32-NEXT: slli a0, a1, 28
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 9(sp)
				; RV32-NEXT: slli a1, a1, 30
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 8(sp)
				; RV32-NEXT: addi a0, sp, 16
				; RV32-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV32-NEXT: vle8.v v8, (a0)
				; RV32-NEXT: mv a0, sp
				; RV32-NEXT: vle8.v v9, (a0)
				; RV32-NEXT: vand.vi v8, v8, 1
				; RV32-NEXT: vmsne.vi v0, v8, 0
				; RV32-NEXT: vand.vi v8, v9, 1
				; RV32-NEXT: vmsne.vi v8, v8, 0
				; RV32-NEXT: addi sp, sp, 32
				; RV32-NEXT: ret
				;
				; RV64-LABEL: vector_deinterleave_v16i1_v32i1:
				; RV64: # %bb.0:
				; RV64-NEXT: addi sp, sp, -32
				; RV64-NEXT: .cfi_def_cfa_offset 32
				; RV64-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV64-NEXT: vfirst.m a0, v0
				; RV64-NEXT: seqz a0, a0
				; RV64-NEXT: sb a0, 16(sp)
				; RV64-NEXT: vsetivli zero, 0, e16, mf4, ta, ma
				; RV64-NEXT: vmv.x.s a0, v0
				; RV64-NEXT: slli a1, a0, 49
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 23(sp)
				; RV64-NEXT: slli a1, a0, 51
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 22(sp)
				; RV64-NEXT: slli a1, a0, 53
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 21(sp)
				; RV64-NEXT: slli a1, a0, 55
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 20(sp)
				; RV64-NEXT: slli a1, a0, 57
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 19(sp)
				; RV64-NEXT: slli a1, a0, 59
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 18(sp)
				; RV64-NEXT: slli a1, a0, 61
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 17(sp)
				; RV64-NEXT: vsetivli zero, 2, e8, mf4, ta, ma
				; RV64-NEXT: vslidedown.vi v8, v0, 2
				; RV64-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV64-NEXT: vfirst.m a1, v8
				; RV64-NEXT: seqz a1, a1
				; RV64-NEXT: sb a1, 24(sp)
				; RV64-NEXT: vsetivli zero, 0, e16, mf4, ta, ma
				; RV64-NEXT: vmv.x.s a1, v8
				; RV64-NEXT: slli a2, a1, 49
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 31(sp)
				; RV64-NEXT: slli a2, a1, 51
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 30(sp)
				; RV64-NEXT: slli a2, a1, 53
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 29(sp)
				; RV64-NEXT: slli a2, a1, 55
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 28(sp)
				; RV64-NEXT: slli a2, a1, 57
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 27(sp)
				; RV64-NEXT: slli a2, a1, 59
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 26(sp)
				; RV64-NEXT: slli a2, a1, 61
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 25(sp)
				; RV64-NEXT: slli a2, a0, 48
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 7(sp)
				; RV64-NEXT: slli a2, a0, 50
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 6(sp)
				; RV64-NEXT: slli a2, a0, 52
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 5(sp)
				; RV64-NEXT: slli a2, a0, 54
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 4(sp)
				; RV64-NEXT: slli a2, a0, 56
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 3(sp)
				; RV64-NEXT: slli a2, a0, 58
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 2(sp)
				; RV64-NEXT: slli a2, a0, 60
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 1(sp)
				; RV64-NEXT: slli a0, a0, 62
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 0(sp)
				; RV64-NEXT: slli a0, a1, 48
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 15(sp)
				; RV64-NEXT: slli a0, a1, 50
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 14(sp)
				; RV64-NEXT: slli a0, a1, 52
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 13(sp)
				; RV64-NEXT: slli a0, a1, 54
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 12(sp)
				; RV64-NEXT: slli a0, a1, 56
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 11(sp)
				; RV64-NEXT: slli a0, a1, 58
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 10(sp)
				; RV64-NEXT: slli a0, a1, 60
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 9(sp)
				; RV64-NEXT: slli a1, a1, 62
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 8(sp)
				; RV64-NEXT: addi a0, sp, 16
				; RV64-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV64-NEXT: vle8.v v8, (a0)
				; RV64-NEXT: mv a0, sp
				; RV64-NEXT: vle8.v v9, (a0)
				; RV64-NEXT: vand.vi v8, v8, 1
				; RV64-NEXT: vmsne.vi v0, v8, 0
				; RV64-NEXT: vand.vi v8, v9, 1
				; RV64-NEXT: vmsne.vi v8, v8, 0
				; RV64-NEXT: addi sp, sp, 32
				; RV64-NEXT: ret
				%retval = call {<16 x i1>, <16 x i1>} @llvm.experimental.vector.deinterleave2.v32i1(<32 x i1> %vec)
				ret {<16 x i1>, <16 x i1>} %retval
				}

				define {<16 x i8>, <16 x i8>} @vector_deinterleave_v16i8_v32i8(<32 x i8> %vec) {
				; CHECK-LABEL: vector_deinterleave_v16i8_v32i8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 8
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<16 x i8>, <16 x i8>} @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8> %vec)
				ret {<16 x i8>, <16 x i8>} %retval
				}

				define {<8 x i16>, <8 x i16>} @vector_deinterleave_v8i16_v16i16(<16 x i16> %vec) {
				; CHECK-LABEL: vector_deinterleave_v8i16_v16i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<8 x i16>, <8 x i16>} @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16> %vec)
				ret {<8 x i16>, <8 x i16>} %retval
				}

				define {<4 x i32>, <4 x i32>} @vector_deinterleave_v4i32_vv8i32(<8 x i32> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4i32_vv8i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vnsrl.wx v10, v8, a0
				; CHECK-NEXT: vnsrl.wi v11, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v11
				; CHECK-NEXT: vmv.v.v v9, v10
				; CHECK-NEXT: ret
				%retval = call {<4 x i32>, <4 x i32>} @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32> %vec)
				ret {<4 x i32>, <4 x i32>} %retval
				}

				define {<2 x i64>, <2 x i64>} @vector_deinterleave_v2i64_v4i64(<4 x i64> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2i64_v4i64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e64, m2, ta, ma
				; CHECK-NEXT: vslidedown.vi v12, v8, 2
				; CHECK-NEXT: li a0, 2
				; CHECK-NEXT: vmv.s.x v0, a0
				; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, mu
				; CHECK-NEXT: vrgather.vi v10, v8, 0
				; CHECK-NEXT: vrgather.vi v10, v12, 0, v0.t
				; CHECK-NEXT: vrgather.vi v11, v8, 1
				; CHECK-NEXT: vrgather.vi v11, v12, 1, v0.t
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<2 x i64>, <2 x i64>} @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> %vec)
				ret {<2 x i64>, <2 x i64>} %retval
				}

				declare {<16 x i1>, <16 x i1>} @llvm.experimental.vector.deinterleave2.v32i1(<32 x i1>)
				declare {<16 x i8>, <16 x i8>} @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8>)
				declare {<8 x i16>, <8 x i16>} @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16>)
				declare {<4 x i32>, <4 x i32>} @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32>)
				declare {<2 x i64>, <2 x i64>} @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64>)

				; Floats

				define {<2 x half>, <2 x half>} @vector_deinterleave_v2f16_v4f16(<4 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f16_v4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v9, v8, 16
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%retval = call {<2 x half>, <2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %vec)
				ret {<2 x half>, <2 x half>} %retval
				}

				define {<4 x half>, <4 x half>} @vector_deinterleave_v4f16_v8f16(<8 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4f16_v8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v9, v8, 16
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%retval = call {<4 x half>, <4 x half>} @llvm.experimental.vector.deinterleave2.v8f16(<8 x half> %vec)
				ret {<4 x half>, <4 x half>} %retval
				}

				define {<2 x float>, <2 x float>} @vector_deinterleave_v2f32_v4f32(<4 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f32_v4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
				; CHECK-NEXT: vnsrl.wx v9, v8, a0
				; CHECK-NEXT: vnsrl.wi v8, v8, 0
				; CHECK-NEXT: ret
				%retval = call {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32(<4 x float> %vec)
				ret {<2 x float>, <2 x float>} %retval
				}

				define {<8 x half>, <8 x half>} @vector_deinterleave_v8f16_v16f16(<16 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v8f16_v16f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half> %vec)
				ret {<8 x half>, <8 x half>} %retval
				}

				define {<4 x float>, <4 x float>} @vector_deinterleave_v4f32_v8f32(<8 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4f32_v8f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vnsrl.wx v10, v8, a0
				; CHECK-NEXT: vnsrl.wi v11, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v11
				; CHECK-NEXT: vmv.v.v v9, v10
				; CHECK-NEXT: ret
				%retval = call {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float> %vec)
				ret {<4 x float>, <4 x float>} %retval
				}

				define {<2 x double>, <2 x double>} @vector_deinterleave_v2f64_v4f64(<4 x double> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f64_v4f64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e64, m2, ta, ma
				; CHECK-NEXT: vslidedown.vi v12, v8, 2
				; CHECK-NEXT: li a0, 2
				; CHECK-NEXT: vmv.s.x v0, a0
				; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, mu
				; CHECK-NEXT: vrgather.vi v10, v8, 0
				; CHECK-NEXT: vrgather.vi v10, v12, 0, v0.t
				; CHECK-NEXT: vrgather.vi v11, v8, 1
				; CHECK-NEXT: vrgather.vi v11, v12, 1, v0.t
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> %vec)
				ret {<2 x double>, <2 x double>} %retval
				}

				declare {<2 x half>,<2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half>)
				declare {<4 x half>, <4 x half>} @llvm.experimental.vector.deinterleave2.v8f16(<8 x half>)
				declare {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32(<4 x float>)
				declare {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half>)
				declare {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float>)
				declare {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double>)

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s

				; Integers

				define {<vscale x 16 x i1>, <vscale x 16 x i1>} @vector_deinterleave_nxv16i1_nxv32i1(<vscale x 32 x i1> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv16i1_nxv32i1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v8, v0
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e8, mf2, ta, ma
				; CHECK-NEXT: vslidedown.vx v0, v0, a0
				; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v10, 0
				; CHECK-NEXT: vmerge.vim v14, v10, 1, v0
				; CHECK-NEXT: vmv1r.v v0, v8
				; CHECK-NEXT: vmerge.vim v12, v10, 1, v0
				; CHECK-NEXT: vnsrl.wi v8, v12, 0
				; CHECK-NEXT: vand.vi v8, v8, 1
				; CHECK-NEXT: vmsne.vi v0, v8, 0
				; CHECK-NEXT: vnsrl.wi v8, v12, 8
				; CHECK-NEXT: vand.vi v10, v8, 1
				; CHECK-NEXT: vmsne.vi v8, v10, 0
				; CHECK-NEXT: ret
				%retval = call {<vscale x 16 x i1>, <vscale x 16 x i1>} @llvm.experimental.vector.deinterleave2.nxv32i1(<vscale x 32 x i1> %vec)
				ret {<vscale x 16 x i1>, <vscale x 16 x i1>} %retval
				}

				define {<vscale x 16 x i8>, <vscale x 16 x i8>} @vector_deinterleave_nxv16i8_nxv32i8(<vscale x 32 x i8> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv16i8_nxv32i8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
				; CHECK-NEXT: vnsrl.wi v12, v8, 0
				; CHECK-NEXT: vnsrl.wi v14, v8, 8
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: vmv.v.v v10, v14
				; CHECK-NEXT: ret
				%retval = call {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %vec)
				ret {<vscale x 16 x i8>, <vscale x 16 x i8>} %retval
				}

				define {<vscale x 8 x i16>, <vscale x 8 x i16>} @vector_deinterleave_nxv8i16_nxv16i16(<vscale x 16 x i16> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8i16_nxv16i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
				; CHECK-NEXT: vnsrl.wi v12, v8, 0
				; CHECK-NEXT: vnsrl.wi v14, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: vmv.v.v v10, v14
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16> %vec)
				ret {<vscale x 8 x i16>, <vscale x 8 x i16>} %retval
				}

				define {<vscale x 4 x i32>, <vscale x 4 x i32>} @vector_deinterleave_nxv4i32_nxvv8i32(<vscale x 8 x i32> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4i32_nxvv8i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetvli a1, zero, e32, m2, ta, ma
				; CHECK-NEXT: vnsrl.wx v12, v8, a0
				; CHECK-NEXT: vnsrl.wi v14, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v14
				; CHECK-NEXT: vmv.v.v v10, v12
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> %vec)
				ret {<vscale x 4 x i32>, <vscale x 4 x i32>} %retval
				}

				define {<vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv4i64(<vscale x 4 x i64> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e64, m4, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vadd.vv v16, v12, v12
				; CHECK-NEXT: vrgather.vv v12, v8, v16
				; CHECK-NEXT: vadd.vi v16, v16, 1
				; CHECK-NEXT: vrgather.vv v20, v8, v16
				; CHECK-NEXT: vmv2r.v v8, v12
				; CHECK-NEXT: vmv2r.v v10, v20
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64> %vec)
				ret {<vscale x 2 x i64>, <vscale x 2 x i64>} %retval
				}

				declare {<vscale x 16 x i1>, <vscale x 16 x i1>} @llvm.experimental.vector.deinterleave2.nxv32i1(<vscale x 32 x i1>)
				declare {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8>)
				declare {<vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16>)
				declare {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32>)
				declare {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64>)

				; Floats

				define {<vscale x 2 x half>, <vscale x 2 x half>} @vector_deinterleave_nxv2f16_nxv4f16(<vscale x 4 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f16_nxv4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v9, v8, 16
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x half>, <vscale x 2 x half>} @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %vec)
				ret {<vscale x 2 x half>, <vscale x 2 x half>} %retval
				}

				define {<vscale x 4 x half>, <vscale x 4 x half>} @vector_deinterleave_nxv4f16_nxv8f16(<vscale x 8 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4f16_nxv8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %vec)
				ret {<vscale x 4 x half>, <vscale x 4 x half>} %retval
				}

				define {<vscale x 2 x float>, <vscale x 2 x float>} @vector_deinterleave_nxv2f32_nxv4f32(<vscale x 4 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f32_nxv4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetvli a1, zero, e32, m1, ta, ma
				; CHECK-NEXT: vnsrl.wx v10, v8, a0
				; CHECK-NEXT: vnsrl.wi v11, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v11
				; CHECK-NEXT: vmv.v.v v9, v10
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %vec)
				ret {<vscale x 2 x float>, <vscale x 2 x float>} %retval
				}

				define {<vscale x 8 x half>, <vscale x 8 x half>} @vector_deinterleave_nxv8f16_nxv16f16(<vscale x 16 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8f16_nxv16f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
				; CHECK-NEXT: vnsrl.wi v12, v8, 0
				; CHECK-NEXT: vnsrl.wi v14, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: vmv.v.v v10, v14
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %vec)
				ret {<vscale x 8 x half>, <vscale x 8 x half>} %retval
				}

				define {<vscale x 4 x float>, <vscale x 4 x float>} @vector_deinterleave_nxv4f32_nxv8f32(<vscale x 8 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4f32_nxv8f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetvli a1, zero, e32, m2, ta, ma
				; CHECK-NEXT: vnsrl.wx v12, v8, a0
				; CHECK-NEXT: vnsrl.wi v14, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v14
				; CHECK-NEXT: vmv.v.v v10, v12
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %vec)
				ret {<vscale x 4 x float>, <vscale x 4 x float>} %retval
				}

				define {<vscale x 2 x double>, <vscale x 2 x double>} @vector_deinterleave_nxv2f64_nxv4f64(<vscale x 4 x double> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f64_nxv4f64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e64, m4, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vadd.vv v16, v12, v12
				; CHECK-NEXT: vrgather.vv v12, v8, v16
				; CHECK-NEXT: vadd.vi v16, v16, 1
				; CHECK-NEXT: vrgather.vv v20, v8, v16
				; CHECK-NEXT: vmv2r.v v8, v12
				; CHECK-NEXT: vmv2r.v v10, v20
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %vec)
				ret {<vscale x 2 x double>, <vscale x 2 x double>} %retval
				}

				declare {<vscale x 2 x half>,<vscale x 2 x half>} @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
				declare {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
				declare {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float>)
				declare {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half>)
				declare {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
				declare {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)

llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck -check-prefixes=CHECK,RV32 %s
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck -check-prefixes=CHECK,RV64 %s

				; Integers

				define <32 x i1> @vector_interleave_v32i1_v16i1(<16 x i1> %a, <16 x i1> %b) {
				; RV32-LABEL: vector_interleave_v32i1_v16i1:
				; RV32: # %bb.0:
				; RV32-NEXT: addi sp, sp, -64
				; RV32-NEXT: .cfi_def_cfa_offset 64
				; RV32-NEXT: sw ra, 60(sp) # 4-byte Folded Spill
				; RV32-NEXT: sw s0, 56(sp) # 4-byte Folded Spill
				; RV32-NEXT: .cfi_offset ra, -4
				; RV32-NEXT: .cfi_offset s0, -8
				; RV32-NEXT: addi s0, sp, 64
				; RV32-NEXT: .cfi_def_cfa s0, 0
				; RV32-NEXT: andi sp, sp, -32
				; RV32-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV32-NEXT: vfirst.m a0, v8
				; RV32-NEXT: seqz a0, a0
				; RV32-NEXT: sb a0, 1(sp)
				; RV32-NEXT: vfirst.m a0, v0
				; RV32-NEXT: seqz a0, a0
				; RV32-NEXT: sb a0, 0(sp)
				; RV32-NEXT: vsetivli zero, 0, e16, mf4, ta, ma
				; RV32-NEXT: vmv.x.s a0, v8
				; RV32-NEXT: slli a1, a0, 16
				; RV32-NEXT: srli a1, a1, 31
				; RV32-NEXT: sb a1, 31(sp)
				; RV32-NEXT: vmv.x.s a1, v0
				; RV32-NEXT: slli a2, a1, 16
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 30(sp)
				; RV32-NEXT: slli a2, a0, 17
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 29(sp)
				; RV32-NEXT: slli a2, a1, 17
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 28(sp)
				reamesUnsubmitted Done Reply Inline Actions Missing check lines here, probably due to conflict with autogen. reames: Missing check lines here, probably due to conflict with autogen.
				; RV32-NEXT: slli a2, a0, 18
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 27(sp)
				; RV32-NEXT: slli a2, a1, 18
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 26(sp)
				; RV32-NEXT: slli a2, a0, 19
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 25(sp)
				; RV32-NEXT: slli a2, a1, 19
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 24(sp)
				; RV32-NEXT: slli a2, a0, 20
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 23(sp)
				; RV32-NEXT: slli a2, a1, 20
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 22(sp)
				; RV32-NEXT: slli a2, a0, 21
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 21(sp)
				; RV32-NEXT: slli a2, a1, 21
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 20(sp)
				; RV32-NEXT: slli a2, a0, 22
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 19(sp)
				; RV32-NEXT: slli a2, a1, 22
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 18(sp)
				; RV32-NEXT: slli a2, a0, 23
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 17(sp)
				; RV32-NEXT: slli a2, a1, 23
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 16(sp)
				; RV32-NEXT: slli a2, a0, 24
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 15(sp)
				; RV32-NEXT: slli a2, a1, 24
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 14(sp)
				; RV32-NEXT: slli a2, a0, 25
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 13(sp)
				; RV32-NEXT: slli a2, a1, 25
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 12(sp)
				; RV32-NEXT: slli a2, a0, 26
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 11(sp)
				; RV32-NEXT: slli a2, a1, 26
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 10(sp)
				; RV32-NEXT: slli a2, a0, 27
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 9(sp)
				; RV32-NEXT: slli a2, a1, 27
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 8(sp)
				; RV32-NEXT: slli a2, a0, 28
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 7(sp)
				; RV32-NEXT: slli a2, a1, 28
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 6(sp)
				; RV32-NEXT: slli a2, a0, 29
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 5(sp)
				; RV32-NEXT: slli a2, a1, 29
				; RV32-NEXT: srli a2, a2, 31
				; RV32-NEXT: sb a2, 4(sp)
				; RV32-NEXT: slli a0, a0, 30
				; RV32-NEXT: srli a0, a0, 31
				; RV32-NEXT: sb a0, 3(sp)
				; RV32-NEXT: slli a1, a1, 30
				; RV32-NEXT: srli a1, a1, 31
				reamesUnsubmitted Done Reply Inline Actions Same problem here. reames: Same problem here.
				; RV32-NEXT: sb a1, 2(sp)
				; RV32-NEXT: li a0, 32
				; RV32-NEXT: mv a1, sp
				; RV32-NEXT: vsetvli zero, a0, e8, m2, ta, ma
				; RV32-NEXT: vle8.v v8, (a1)
				; RV32-NEXT: vand.vi v8, v8, 1
				; RV32-NEXT: vmsne.vi v0, v8, 0
				; RV32-NEXT: addi sp, s0, -64
				; RV32-NEXT: lw ra, 60(sp) # 4-byte Folded Reload
				; RV32-NEXT: lw s0, 56(sp) # 4-byte Folded Reload
				; RV32-NEXT: addi sp, sp, 64
				; RV32-NEXT: ret
				;
				; RV64-LABEL: vector_interleave_v32i1_v16i1:
				; RV64: # %bb.0:
				; RV64-NEXT: addi sp, sp, -64
				; RV64-NEXT: .cfi_def_cfa_offset 64
				; RV64-NEXT: sd ra, 56(sp) # 8-byte Folded Spill
				; RV64-NEXT: sd s0, 48(sp) # 8-byte Folded Spill
				; RV64-NEXT: .cfi_offset ra, -8
				; RV64-NEXT: .cfi_offset s0, -16
				; RV64-NEXT: addi s0, sp, 64
				; RV64-NEXT: .cfi_def_cfa s0, 0
				; RV64-NEXT: andi sp, sp, -32
				; RV64-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; RV64-NEXT: vfirst.m a0, v8
				; RV64-NEXT: seqz a0, a0
				; RV64-NEXT: sb a0, 1(sp)
				; RV64-NEXT: vfirst.m a0, v0
				; RV64-NEXT: seqz a0, a0
				; RV64-NEXT: sb a0, 0(sp)
				; RV64-NEXT: vsetivli zero, 0, e16, mf4, ta, ma
				; RV64-NEXT: vmv.x.s a0, v8
				; RV64-NEXT: slli a1, a0, 48
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 31(sp)
				; RV64-NEXT: vmv.x.s a1, v0
				; RV64-NEXT: slli a2, a1, 48
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 30(sp)
				; RV64-NEXT: slli a2, a0, 49
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 29(sp)
				; RV64-NEXT: slli a2, a1, 49
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 28(sp)
				; RV64-NEXT: slli a2, a0, 50
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 27(sp)
				; RV64-NEXT: slli a2, a1, 50
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 26(sp)
				; RV64-NEXT: slli a2, a0, 51
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 25(sp)
				; RV64-NEXT: slli a2, a1, 51
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 24(sp)
				; RV64-NEXT: slli a2, a0, 52
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 23(sp)
				; RV64-NEXT: slli a2, a1, 52
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 22(sp)
				; RV64-NEXT: slli a2, a0, 53
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 21(sp)
				; RV64-NEXT: slli a2, a1, 53
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 20(sp)
				; RV64-NEXT: slli a2, a0, 54
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 19(sp)
				; RV64-NEXT: slli a2, a1, 54
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 18(sp)
				; RV64-NEXT: slli a2, a0, 55
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 17(sp)
				; RV64-NEXT: slli a2, a1, 55
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 16(sp)
				; RV64-NEXT: slli a2, a0, 56
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 15(sp)
				; RV64-NEXT: slli a2, a1, 56
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 14(sp)
				; RV64-NEXT: slli a2, a0, 57
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 13(sp)
				; RV64-NEXT: slli a2, a1, 57
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 12(sp)
				; RV64-NEXT: slli a2, a0, 58
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 11(sp)
				; RV64-NEXT: slli a2, a1, 58
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 10(sp)
				; RV64-NEXT: slli a2, a0, 59
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 9(sp)
				; RV64-NEXT: slli a2, a1, 59
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 8(sp)
				; RV64-NEXT: slli a2, a0, 60
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 7(sp)
				; RV64-NEXT: slli a2, a1, 60
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 6(sp)
				; RV64-NEXT: slli a2, a0, 61
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 5(sp)
				; RV64-NEXT: slli a2, a1, 61
				; RV64-NEXT: srli a2, a2, 63
				; RV64-NEXT: sb a2, 4(sp)
				; RV64-NEXT: slli a0, a0, 62
				; RV64-NEXT: srli a0, a0, 63
				; RV64-NEXT: sb a0, 3(sp)
				; RV64-NEXT: slli a1, a1, 62
				; RV64-NEXT: srli a1, a1, 63
				; RV64-NEXT: sb a1, 2(sp)
				; RV64-NEXT: li a0, 32
				; RV64-NEXT: mv a1, sp
				; RV64-NEXT: vsetvli zero, a0, e8, m2, ta, ma
				; RV64-NEXT: vle8.v v8, (a1)
				; RV64-NEXT: vand.vi v8, v8, 1
				; RV64-NEXT: vmsne.vi v0, v8, 0
				; RV64-NEXT: addi sp, s0, -64
				; RV64-NEXT: ld ra, 56(sp) # 8-byte Folded Reload
				; RV64-NEXT: ld s0, 48(sp) # 8-byte Folded Reload
				; RV64-NEXT: addi sp, sp, 64
				; RV64-NEXT: ret
				%res = call <32 x i1> @llvm.experimental.vector.interleave2.v32i1(<16 x i1> %a, <16 x i1> %b)
				ret <32 x i1> %res
				}

				define <16 x i16> @vector_interleave_v16i16_v8i16(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: vector_interleave_v16i16_v8i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv2r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16> %a, <8 x i16> %b)
				ret <16 x i16> %res
				}

				define <8 x i32> @vector_interleave_v8i32_v4i32(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LABEL: vector_interleave_v8i32_v4i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv2r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32> %a, <4 x i32> %b)
				ret <8 x i32> %res
				}

				define <4 x i64> @vector_interleave_v4i64_v2i64(<2 x i64> %a, <2 x i64> %b) {
				; RV32-LABEL: vector_interleave_v4i64_v2i64:
				; RV32: # %bb.0:
				; RV32-NEXT: vmv1r.v v10, v9
				; RV32-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV32-NEXT: vsetivli zero, 8, e32, m2, ta, ma
				; RV32-NEXT: vmv.v.i v12, 0
				; RV32-NEXT: vsetivli zero, 2, e64, m2, tu, ma
				; RV32-NEXT: vslideup.vi v12, v8, 0
				; RV32-NEXT: vsetivli zero, 4, e64, m2, tu, ma
				; RV32-NEXT: vslideup.vi v12, v10, 2
				; RV32-NEXT: lui a0, %hi(.LCPI3_0)
				; RV32-NEXT: addi a0, a0, %lo(.LCPI3_0)
				; RV32-NEXT: vsetvli zero, zero, e64, m2, ta, ma
				; RV32-NEXT: vle16.v v10, (a0)
				; RV32-NEXT: vrgatherei16.vv v8, v12, v10
				; RV32-NEXT: ret
				;
				; RV64-LABEL: vector_interleave_v4i64_v2i64:
				; RV64: # %bb.0:
				; RV64-NEXT: vmv1r.v v10, v9
				; RV64-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV64-NEXT: vsetivli zero, 4, e64, m2, ta, ma
				; RV64-NEXT: vmv.v.i v12, 0
				; RV64-NEXT: vsetivli zero, 2, e64, m2, tu, ma
				; RV64-NEXT: vslideup.vi v12, v8, 0
				; RV64-NEXT: vsetivli zero, 4, e64, m2, tu, ma
				; RV64-NEXT: vslideup.vi v12, v10, 2
				; RV64-NEXT: lui a0, %hi(.LCPI3_0)
				; RV64-NEXT: addi a0, a0, %lo(.LCPI3_0)
				; RV64-NEXT: vsetvli zero, zero, e64, m2, ta, ma
				; RV64-NEXT: vle64.v v10, (a0)
				; RV64-NEXT: vrgather.vv v8, v12, v10
				; RV64-NEXT: ret
				%res = call <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64> %a, <2 x i64> %b)
				ret <4 x i64> %res
				}

				declare <32 x i1> @llvm.experimental.vector.interleave2.v32i1(<16 x i1>, <16 x i1>)
				declare <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16>, <8 x i16>)
				declare <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32>, <4 x i32>)
				declare <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64>, <2 x i64>)

				; Floats

				define <4 x half> @vector_interleave_v4f16_v2f16(<2 x half> %a, <2 x half> %b) {
				; CHECK-LABEL: vector_interleave_v4f16_v2f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half> %a, <2 x half> %b)
				ret <4 x half> %res
				}

				define <8 x half> @vector_interleave_v8f16_v4f16(<4 x half> %a, <4 x half> %b) {
				; CHECK-LABEL: vector_interleave_v8f16_v4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half> %a, <4 x half> %b)
				ret <8 x half> %res
				}

				define <4 x float> @vector_interleave_v4f32_v2f32(<2 x float> %a, <2 x float> %b) {
				; CHECK-LABEL: vector_interleave_v4f32_v2f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <4 x float> @llvm.experimental.vector.interleave2.v4f32(<2 x float> %a, <2 x float> %b)
				ret <4 x float> %res
				}

				define <16 x half> @vector_interleave_v16f16_v8f16(<8 x half> %a, <8 x half> %b) {
				; CHECK-LABEL: vector_interleave_v16f16_v8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv2r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half> %a, <8 x half> %b)
				ret <16 x half> %res
				}

				define <8 x float> @vector_interleave_v8f32_v4f32(<4 x float> %a, <4 x float> %b) {
				; CHECK-LABEL: vector_interleave_v8f32_v4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv2r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float> %a, <4 x float> %b)
				ret <8 x float> %res
				}

				define <4 x double> @vector_interleave_v4f64_v2f64(<2 x double> %a, <2 x double> %b) {
				; RV32-LABEL: vector_interleave_v4f64_v2f64:
				; RV32: # %bb.0:
				; RV32-NEXT: vmv1r.v v10, v9
				; RV32-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV32-NEXT: vsetivli zero, 4, e64, m2, ta, ma
				; RV32-NEXT: vmv.v.i v12, 0
				; RV32-NEXT: vsetivli zero, 2, e64, m2, tu, ma
				; RV32-NEXT: vslideup.vi v12, v8, 0
				; RV32-NEXT: vsetivli zero, 4, e64, m2, tu, ma
				; RV32-NEXT: vslideup.vi v12, v10, 2
				; RV32-NEXT: lui a0, %hi(.LCPI9_0)
				; RV32-NEXT: addi a0, a0, %lo(.LCPI9_0)
				; RV32-NEXT: vsetvli zero, zero, e64, m2, ta, ma
				; RV32-NEXT: vle16.v v10, (a0)
				; RV32-NEXT: vrgatherei16.vv v8, v12, v10
				; RV32-NEXT: ret
				;
				; RV64-LABEL: vector_interleave_v4f64_v2f64:
				; RV64: # %bb.0:
				; RV64-NEXT: vmv1r.v v10, v9
				; RV64-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV64-NEXT: vsetivli zero, 4, e64, m2, ta, ma
				; RV64-NEXT: vmv.v.i v12, 0
				; RV64-NEXT: vsetivli zero, 2, e64, m2, tu, ma
				; RV64-NEXT: vslideup.vi v12, v8, 0
				; RV64-NEXT: vsetivli zero, 4, e64, m2, tu, ma
				; RV64-NEXT: vslideup.vi v12, v10, 2
				; RV64-NEXT: lui a0, %hi(.LCPI9_0)
				; RV64-NEXT: addi a0, a0, %lo(.LCPI9_0)
				; RV64-NEXT: vsetvli zero, zero, e64, m2, ta, ma
				; RV64-NEXT: vle64.v v10, (a0)
				; RV64-NEXT: vrgather.vv v8, v12, v10
				; RV64-NEXT: ret
				%res = call <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double> %a, <2 x double> %b)
				ret <4 x double> %res
				}


				declare <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half>, <2 x half>)
				declare <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half>, <4 x half>)
				declare <4 x float> @llvm.experimental.vector.interleave2.v4f32(<2 x float>, <2 x float>)
				declare <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half>, <8 x half>)
				declare <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float>, <4 x float>)
				declare <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double>, <2 x double>)

llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s

				; Integers

				define <vscale x 32 x i1> @vector_interleave_nxv32i1_nxv16i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) {
				; CHECK-LABEL: vector_interleave_nxv32i1_nxv16i1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v9, v0
				; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v10, 0
				; CHECK-NEXT: vmv1r.v v0, v8
				; CHECK-NEXT: vmerge.vim v12, v10, 1, v0
				; CHECK-NEXT: vmv1r.v v0, v9
				; CHECK-NEXT: vmerge.vim v8, v10, 1, v0
				; CHECK-NEXT: vwaddu.vv v16, v8, v12
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v16, a0, v12
				; CHECK-NEXT: vand.vi v8, v18, 1
				; CHECK-NEXT: vmsne.vi v10, v8, 0
				; CHECK-NEXT: vand.vi v8, v16, 1
				; CHECK-NEXT: vmsne.vi v0, v8, 0
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: add a1, a0, a0
				; CHECK-NEXT: vsetvli zero, a1, e8, mf2, tu, ma
				; CHECK-NEXT: vslideup.vx v0, v10, a0
				; CHECK-NEXT: ret
				%res = call <vscale x 32 x i1> @llvm.experimental.vector.interleave2.nxv32i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b)
				ret <vscale x 32 x i1> %res
				}

				define <vscale x 16 x i16> @vector_interleave_nxv16i16_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: vector_interleave_nxv16i16_nxv8i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
				; CHECK-NEXT: vwaddu.vv v12, v8, v10
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v12, a0, v10
				; CHECK-NEXT: vmv4r.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 16 x i16> %res
				}

				define <vscale x 8 x i32> @vector_interleave_nxv8i32_nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: vector_interleave_nxv8i32_nxv4i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
				; CHECK-NEXT: vwaddu.vv v12, v8, v10
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v12, a0, v10
				; CHECK-NEXT: vmv4r.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 8 x i32> %res
				}

				define <vscale x 4 x i64> @vector_interleave_nxv4i64_nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: vector_interleave_nxv4i64_nxv2i64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v10m2 killed $v10m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, mu
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vand.vi v13, v12, 1
				; CHECK-NEXT: vmsne.vi v0, v13, 0
				; CHECK-NEXT: vsrl.vi v16, v12, 1
				; CHECK-NEXT: vadd.vx v16, v16, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
				craig.topperUnsubmitted Done Reply Inline Actions This is a mask agnostic vadd.vx. You need mask undisturbed. craig.topper: This is a mask agnostic vadd.vx. You need mask undisturbed.
				; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: vrgatherei16.vv v12, v8, v16, v0.t
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 4 x i64> %res
				}

				declare <vscale x 32 x i1> @llvm.experimental.vector.interleave2.nxv32i1(<vscale x 16 x i1>, <vscale x 16 x i1>)
				declare <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64>, <vscale x 2 x i64>)

				; Floats

				define <vscale x 4 x half> @vector_interleave_nxv4f16_nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b) {
				; CHECK-LABEL: vector_interleave_nxv4f16_nxv2f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, ma
				; CHECK-NEXT: vslidedown.vx v8, v10, a0
				; CHECK-NEXT: add a1, a0, a0
				; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, ma
				; CHECK-NEXT: vslideup.vx v10, v8, a0
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b)
				ret <vscale x 4 x half> %res
				}

				define <vscale x 8 x half> @vector_interleave_nxv8f16_nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) {
				; CHECK-LABEL: vector_interleave_nxv8f16_nxv4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv2r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b)
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @vector_interleave_nxv4f32_nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b) {
				; CHECK-LABEL: vector_interleave_nxv4f32_nxv2f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e32, m1, ta, ma
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv2r.v v8, v10
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b)
				ret <vscale x 4 x float> %res
				}

				define <vscale x 16 x half> @vector_interleave_nxv16f16_nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: vector_interleave_nxv16f16_nxv8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
				; CHECK-NEXT: vwaddu.vv v12, v8, v10
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v12, a0, v10
				; CHECK-NEXT: vmv4r.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 16 x half> %res
				}

				define <vscale x 8 x float> @vector_interleave_nxv8f32_nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: vector_interleave_nxv8f32_nxv4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
				; CHECK-NEXT: vwaddu.vv v12, v8, v10
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v12, a0, v10
				; CHECK-NEXT: vmv4r.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 8 x float> %res
				}

				define <vscale x 4 x double> @vector_interleave_nxv4f64_nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: vector_interleave_nxv4f64_nxv2f64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v10m2 killed $v10m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, mu
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vand.vi v13, v12, 1
				; CHECK-NEXT: vmsne.vi v0, v13, 0
				; CHECK-NEXT: vsrl.vi v16, v12, 1
				; CHECK-NEXT: vadd.vx v16, v16, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
				; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: vrgatherei16.vv v12, v8, v16, v0.t
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 4 x double> %res
				}


				declare <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half>, <vscale x 2 x half>)
				declare <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half>, <vscale x 4 x half>)
				declare <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float>, <vscale x 2 x float>)
				declare <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Lower interleave and deinterleave intrinsicsClosedPublic

Details

Diff Detail

Event Timeline