This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6580–6585	I'm not that happy with how this mask is generated. Ideally we would just `vmv.v.i v0, 0x55` directly, but I can't seem to do it in a way that keeps SelectionDAG happy. Namely, it doesn't allow bitcasting `<vscale x n x i8>` -> `<vscale x n i1>`

Harbormaster completed remote builds in B213863: Diff 497629.Feb 15 2023, 5:28 AM

The herald failures should be fixed now that https://reviews.llvm.org/D141924#inline-1391060 has been updated

These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather.

craig.topper added inline comments.Feb 15 2023, 9:02 AM

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll
2	I don't think this test even uses your patch and shows the strategy for how your patch can be improved. Only i64 and double test case use vrgather.
9	Missing CHECK lines

For interleave, a couple options to consider:

Concatenate vectors into LMUL+1 vector, then gather using {0,VLMAX,1, VLMAX+1,...]. Forming that index vector takes some care for the scalable case, but I think we can do id()/2 +{masked} VLMAX where mask = {0,1,0,1}. The mask can in turn be computed as {id()&1 == 1) Using a segment store to the stack followed by a whole vector reload.

p.s. This was written before @craig.topper's comment, and I haven't read his in detail.

Here's an excerpt from the code we already have for interleave on fixed vectors. Need to see why it doesn't trigger here.

// Detect an interleave shuffle and lower to                                   
// (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1)) 
bool SwapSources;                                                              
if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) {

In D144092#4129342, @craig.topper wrote:

These should be implementable with clever use of vwaddu and vwmaccu. To avoid the vrgather.

That's a much better approach. But it won't work for i64 though right? In https://reviews.llvm.org/D144143 it falls back to a vrgather with a constant index vector

; RV32-V128-LABEL: unary_interleave_v4i64:
; RV32-V128:       # %bb.0:
; RV32-V128-NEXT:    lui a0, %hi(.LCPI19_0)
; RV32-V128-NEXT:    addi a0, a0, %lo(.LCPI19_0)
; RV32-V128-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
; RV32-V128-NEXT:    vle16.v v12, (a0)
; RV32-V128-NEXT:    vrgatherei16.vv v10, v8, v12
; RV32-V128-NEXT:    vmv.v.v v8, v10
; RV32-V128-NEXT:    ret

We can't load the indices for a scalable vector because it'll be of unknown size.

Rework lowering to use narrowing shifts in the deinterleave case where possible, and use a single gather that doesn't require viota for interleaving

Fix missing check lines

luke mentioned this in D144143: [RISCV] Improve isInterleaveShuffle to handle interleaving the high half and low half of the same source..Feb 16 2023, 7:25 AM

luke edited the summary of this revision. (Show Details)Feb 16 2023, 7:26 AM

luke marked 2 inline comments as done.

Harbormaster completed remote builds in B214149: Diff 498006.Feb 16 2023, 8:21 AM

Be smarter about generating masks by using generating vmsne.vi vx, 0 directly, avoiding a redundant vadd.vi
When SEW < ELEN interleave vectors with vwaddu.vv/vwmacc.vx

luke mentioned this in D144091: [RISCV][NFC] Add VIOTA_VL node.Feb 16 2023, 11:04 AM

Harbormaster completed remote builds in B214208: Diff 498085.Feb 16 2023, 1:28 PM

luke added a child revision: D144175: [RISCV] Combine (store/load interleave,deinterleave) into vsseg2/vlseg2.Feb 17 2023, 4:09 AM

Remove convertToMask

Update tests

Harbormaster completed remote builds in B214394: Diff 498343.Feb 17 2023, 7:39 AM

At a high level, I'm unhappy with the amount of duplication between the scalable path with the new nodes and the fixed vector path. I think this is a great POC - it largely convinces me we *can* lower these reasonably - but we need to rethink this a bit for landing.

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

The other approach would be to have a set of utility functions, and call them appropriately from both places.

I will note I'm open to being convinced this can be done in tree, but in that case, I'd probably want to see a more minimal lowering with incremental improvement and sharing in follow up changes.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6552	This case looks to be missing from the fixed length lowering, we should find a way to pick it up there as well.
6626	This bit of code should be shared with the isInterleaveShuffle above. This will require a bit of restructuring.
llvm/lib/Target/RISCV/RISCVISelLowering.h
273	Unrelated, commit separately please.

In D144092#4135349, @reames wrote:

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors.

I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets.
Presumably a lot of work.
And I wonder if they have the same duplication problem too.

In D144092#4135927, @luke wrote:

In D144092#4135349, @reames wrote:

I'm wondering whether we should canonicalize fixed length shuffles for interleave and deinterleave to the new nodes, and then have the lowering as you roughly structured it. This would avoid some of the code duplication.

I agree, the thought crossed my mind as I was reimplementing all of this. It’s exactly the opposite of what SelectionDAGBuilder does in the first place though: it explicitly generates shuffle_vectors from the intrinsics for fixed length vectors.

I know it’s a question that’s been raised before but perhaps it’s worthwhile discussing with the AArch64 folks if shuffle_vectors should be canonically combined into vector_interleave across all targets.
Presumably a lot of work.
And I wonder if they have the same duplication problem too.

Can we extract the code generation from LowerVECTOR_SHUFFLE into a function that we call in two places? That should reduce the duplication.

Share code between fixed length and scalable vectors

luke edited parent revisions, added: D144386: [RISCV] Use a smaller VL when interleaving fixed vectors; removed: D144091: [RISCV][NFC] Add VIOTA_VL node.Feb 20 2023, 5:39 AM

luke mentioned this in D144387: [RISCV][NFC] Make a note of the operands for RISCVISD::VNSRL_VL.Feb 20 2023, 5:41 AM

Harbormaster completed remote builds in B214739: Diff 498810.Feb 20 2023, 6:55 AM

craig.topper added inline comments.Feb 20 2023, 5:31 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6498	`Subtarget` is unused
6657	Need to pass `Idx` instead of `UNDEF` to get a mask undisturbed vadd.vv.
llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
72	This is a mask agnostic vadd.vx. You need mask undisturbed.

luke mentioned this in rG7e2f2f0fc8f1: [RISCV][NFC] Make a note of the operands for RISCVISD::VNSRL_VL.Feb 21 2023, 1:45 AM

Fix i64 interleave case not using mask undisturbed

luke marked 4 inline comments as done.Feb 21 2023, 2:31 AM

luke added inline comments.Feb 21 2023, 2:37 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6552	I think this is handled in `lowerVECTOR_SHUFFLEAsVNSRL`

Harbormaster completed remote builds in B214963: Diff 499089.Feb 21 2023, 3:47 AM

reames added inline comments.Feb 21 2023, 9:17 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6513	WideN.getNode()->getNumValues() should just be NumVals right?
6545	Please use getDefaultVLOps here.
6548	I strongly request we drop the optimized case from the initial patch and come back to these in follow up changes. I want to focus on having a correct base lowering first.
6578	The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine, we need the index type to be larger than i8.
6616	I think this is the same as WideVT above, but with different naming and computation. Can you standardize on one or the other please?
6622	Please drop the optimized case from the initial patch.
6632	See changeVectorElementType
6651	I think the comments are backwards here. I believe the value you're actually computing is: // 0,0,1,1,2,2,.. etc. This could simply be a "which order do we write vector lanes in" confusion, but this ordered doesn't match the deinterleave comment above either.
6654	Same here

reames added inline comments.Feb 21 2023, 9:58 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6646	See newly added computeVLMax.

reames added inline comments.Feb 21 2023, 10:08 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6578	Realized the case I raised is unreachable until the fast path above is removed (as I suggested).

reames added inline comments.Feb 21 2023, 10:23 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6548	Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop this request. Instead, I'm going to explore a pre-change to make sharing that logic possible.
6622	Same here.

luke added inline comments.Feb 21 2023, 3:52 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6578	Come think of it this could just be replaced with `i16` and `VRGATHEREI16` instead, saving some space.

luke added inline comments.Feb 21 2023, 4:12 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6646	It looks like `computeVLMAX` computes the maximum VLMAX statically, not the actual VLMAX on the current hardware. Although there's a bunch of other places where VLMAX is computed. Maybe we can rename `computeVLMAX` to `computeMaxVLMAX`, and then add a helper function `getVLMAX`

luke added inline comments.Feb 21 2023, 4:14 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6651	I've been staring at the spec too long, they use backwards notation. I agree, keeping it in LLVM order makes more sense

craig.topper added inline comments.Feb 21 2023, 4:14 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6646	It's using ISD::VSCALE which will be expanded to something like `csrr a0, vlenb; srli a0, 3`. We define "vscale" as vlen/64. vlenb is already in bytes so we divide by with a shift.

craig.topper added inline comments.Feb 21 2023, 4:16 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6646	Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we divide by another 8 with a shift right by 3."

luke added inline comments.Feb 21 2023, 4:39 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6578	Never mind, this causes an extra `vsetvli`

Address some (but not all) review comments

luke marked 2 inline comments as done.Feb 21 2023, 4:52 PM

luke added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6646	My bad, I didn't realise how new it actually was and hadn't pulled in 9168c98553ac9a1f8e8b87006f9b1b3f23955beb, I thought you were referring to `RISCVTargetLowering::computeVLMAX`

Use computeVLMax

luke marked 2 inline comments as done.Feb 21 2023, 5:01 PM

Use computeVLMax

LGTM with two test comments addressed.

llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll
40	Missing check lines here, probably due to conflict with autogen.
117	Same problem here.

This revision is now accepted and ready to land.Feb 21 2023, 6:05 PM

Harbormaster completed remote builds in B215132: Diff 499327.Feb 21 2023, 6:28 PM

luke mentioned this in D144532: [RISCV] Reorganize deinterleave lowering for reuse [nfc].Feb 22 2023, 3:05 AM

Fix check lines

luke marked 2 inline comments as done.Feb 22 2023, 9:39 AM

craig.topper added inline comments.Feb 22 2023, 9:43 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
6515	`operator->` works on SDValue. You don't need to call getNode.

luke added a parent revision: D144532: [RISCV] Reorganize deinterleave lowering for reuse [nfc].Feb 22 2023, 9:53 AM

@reames getDeinterleaveViaVNSRL needs some massaging to be reused here with the scalable vectors, I'll do that in a follow up patch to keep this diff clean

Use -> overload

luke marked 8 inline comments as done.Feb 22 2023, 10:06 AM

Remove accidental extra commit

luke added a child revision: D144584: [RISCV][NFC] Reuse getDeinterleaveViaVNSRL to lower deinterleave intrinsics.Feb 22 2023, 11:32 AM

Harbormaster completed remote builds in B215309: Diff 499571.Feb 22 2023, 1:17 PM

Reverse ping - anything preventing this from landing?

In D144092#4147491, @reames wrote:

Reverse ping - anything preventing this from landing?

Was intending to land it with https://reviews.llvm.org/D144584

Rebase

This revision was landed with ongoing or failed builds.Feb 23 2023, 8:23 AM

Closed by commit rG8d15e7275fe1: [RISCV] Lower interleave and deinterleave intrinsics (authored by luke). · Explain Why

This revision was automatically updated to reflect the committed changes.

luke added a commit: rG8d15e7275fe1: [RISCV] Lower interleave and deinterleave intrinsics.

Harbormaster completed remote builds in B215526: Diff 499863.Feb 23 2023, 9:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.h

4 lines

RISCVISelLowering.cpp

175 lines

test/

CodeGen/

RISCV/

rvv/

vector-deinterleave-fixed.ll

166 lines

vector-deinterleave.ll

182 lines

vector-interleave-fixed.ll

176 lines

vector-interleave.ll

216 lines

Diff 498005

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
VWADDU_VL,		VWADDU_VL,
VWSUB_VL,		VWSUB_VL,
VWSUBU_VL,		VWSUBU_VL,
VWADD_W_VL,		VWADD_W_VL,
VWADDU_W_VL,		VWADDU_W_VL,
VWSUB_W_VL,		VWSUB_W_VL,
VWSUBU_W_VL,		VWSUBU_W_VL,

		// Narrowing logical shift right.
		reamesUnsubmitted Done Reply Inline Actions Unrelated, commit separately please. reames: Unrelated, commit separately please.
		// Operands are (source, shift, passthru, mask, vl)
VNSRL_VL,		VNSRL_VL,

// Vector compare producing a mask. Fourth operand is input mask. Fifth		// Vector compare producing a mask. Fourth operand is input mask. Fifth
// operand is VL.		// operand is VL.
SETCC_VL,		SETCC_VL,

// Vector select with an additional VL operand. This operation is unmasked.		// Vector select with an additional VL operand. This operation is unmasked.
VSELECT_VL,		VSELECT_VL,
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	private:
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVPREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVPREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVectorMaskVecReduction(SDValue Op, SelectionDAG &DAG,		SDValue lowerVectorMaskVecReduction(SDValue Op, SelectionDAG &DAG,
bool IsVP) const;		bool IsVP) const;
SDValue lowerFPVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFPVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
		SDValue lowerVECTOR_DEINTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
		SDValue lowerVECTOR_INTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerSTEP_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerSTEP_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVECTOR_REVERSE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECTOR_REVERSE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerABS(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerABS(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerMaskedLoad(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerMaskedLoad(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerMaskedStore(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerMaskedStore(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFixedLengthVectorFCOPYSIGNToRVV(SDValue Op,		SDValue lowerFixedLengthVectorFCOPYSIGNToRVV(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 594 Lines • ▼ Show 20 Lines	for (MVT VT : BoolVecVTs) {
setTruncStoreAction(OtherVT, VT, Expand);		setTruncStoreAction(OtherVT, VT, Expand);
setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,		setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,
VT, Expand);		VT, Expand);
}		}

setOperationAction({ISD::VP_FP_TO_SINT, ISD::VP_FP_TO_UINT,		setOperationAction({ISD::VP_FP_TO_SINT, ISD::VP_FP_TO_UINT,
ISD::VP_TRUNCATE, ISD::VP_SETCC},		ISD::VP_TRUNCATE, ISD::VP_SETCC},
VT, Custom);		VT, Custom);

		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

setOperationAction(ISD::VECTOR_REVERSE, VT, Custom);		setOperationAction(ISD::VECTOR_REVERSE, VT, Custom);

setOperationPromotedToType(		setOperationPromotedToType(
ISD::VECTOR_SPLICE, VT,		ISD::VECTOR_SPLICE, VT,
MVT::getVectorVT(MVT::i8, VT.getVectorElementCount()));		MVT::getVectorVT(MVT::i8, VT.getVectorElementCount()));
}		}

for (MVT VT : IntVecVTs) {		for (MVT VT : IntVecVTs) {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (MVT VT : IntVecVTs) {
setOperationAction({ISD::STEP_VECTOR, ISD::VECTOR_REVERSE}, VT, Custom);		setOperationAction({ISD::STEP_VECTOR, ISD::VECTOR_REVERSE}, VT, Custom);

for (MVT OtherVT : MVT::integer_scalable_vector_valuetypes()) {		for (MVT OtherVT : MVT::integer_scalable_vector_valuetypes()) {
setTruncStoreAction(VT, OtherVT, Expand);		setTruncStoreAction(VT, OtherVT, Expand);
setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,		setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, OtherVT,
VT, Expand);		VT, Expand);
}		}

		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

// Splice		// Splice
setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);		setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);

// Lower CTLZ_ZERO_UNDEF and CTTZ_ZERO_UNDEF if element of VT in the range		// Lower CTLZ_ZERO_UNDEF and CTTZ_ZERO_UNDEF if element of VT in the range
// of f32.		// of f32.
EVT FloatVT = MVT::getVectorVT(MVT::f32, VT.getVectorElementCount());		EVT FloatVT = MVT::getVectorVT(MVT::f32, VT.getVectorElementCount());
if (isTypeLegal(FloatVT)) {		if (isTypeLegal(FloatVT)) {
setOperationAction(		setOperationAction(
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	const auto SetCommonVFPActions = [&](MVT VT) {

setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SELECT_CC, VT, Expand);		setOperationAction(ISD::SELECT_CC, VT, Expand);

setOperationAction(		setOperationAction(
{ISD::CONCAT_VECTORS, ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},		{ISD::CONCAT_VECTORS, ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},
VT, Custom);		VT, Custom);

		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

setOperationAction({ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE}, VT, Custom);		setOperationAction({ISD::VECTOR_REVERSE, ISD::VECTOR_SPLICE}, VT, Custom);

setOperationAction(FloatingPointVPOps, VT, Custom);		setOperationAction(FloatingPointVPOps, VT, Custom);
};		};

// Sets common extload/truncstore actions on RVV floating-point vector		// Sets common extload/truncstore actions on RVV floating-point vector
// types.		// types.
const auto SetCommonVFPExtLoadTruncStoreActions =		const auto SetCommonVFPExtLoadTruncStoreActions =
▲ Show 20 Lines • Show All 3,304 Lines • ▼ Show 20 Lines	SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
case ISD::VP_REDUCE_XOR:		case ISD::VP_REDUCE_XOR:
if (Op.getOperand(1).getValueType().getVectorElementType() == MVT::i1)		if (Op.getOperand(1).getValueType().getVectorElementType() == MVT::i1)
return lowerVectorMaskVecReduction(Op, DAG, /IsVP/ true);		return lowerVectorMaskVecReduction(Op, DAG, /IsVP/ true);
return lowerVPREDUCE(Op, DAG);		return lowerVPREDUCE(Op, DAG);
case ISD::INSERT_SUBVECTOR:		case ISD::INSERT_SUBVECTOR:
return lowerINSERT_SUBVECTOR(Op, DAG);		return lowerINSERT_SUBVECTOR(Op, DAG);
case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
return lowerEXTRACT_SUBVECTOR(Op, DAG);		return lowerEXTRACT_SUBVECTOR(Op, DAG);
		case ISD::VECTOR_DEINTERLEAVE:
		return lowerVECTOR_DEINTERLEAVE(Op, DAG);
		case ISD::VECTOR_INTERLEAVE:
		return lowerVECTOR_INTERLEAVE(Op, DAG);
case ISD::STEP_VECTOR:		case ISD::STEP_VECTOR:
return lowerSTEP_VECTOR(Op, DAG);		return lowerSTEP_VECTOR(Op, DAG);
case ISD::VECTOR_REVERSE:		case ISD::VECTOR_REVERSE:
return lowerVECTOR_REVERSE(Op, DAG);		return lowerVECTOR_REVERSE(Op, DAG);
case ISD::VECTOR_SPLICE:		case ISD::VECTOR_SPLICE:
return lowerVECTOR_SPLICE(Op, DAG);		return lowerVECTOR_SPLICE(Op, DAG);
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return lowerBUILD_VECTOR(Op, DAG, Subtarget);		return lowerBUILD_VECTOR(Op, DAG, Subtarget);
▲ Show 20 Lines • Show All 2,370 Lines • ▼ Show 20 Lines	SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
Slidedown = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVecVT, Slidedown,		Slidedown = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SubVecVT, Slidedown,
DAG.getConstant(0, DL, XLenVT));		DAG.getConstant(0, DL, XLenVT));

// We might have bitcast from a mask type: cast back to the original type if		// We might have bitcast from a mask type: cast back to the original type if
// required.		// required.
return DAG.getBitcast(Op.getSimpleValueType(), Slidedown);		return DAG.getBitcast(Op.getSimpleValueType(), Slidedown);
}		}

		// Widen a vector's operands to i8, then truncate its results back to the
		// original type, typically i8. All operand and result types must be the same.
		static SDValue wideVectorOpToi8(SDValue N, SDLoc &DL, SelectionDAG &DAG) {
		MVT VT = N.getSimpleValueType();
		craig.topperUnsubmitted Done Reply Inline Actions `Subtarget` is unused craig.topper: `Subtarget` is unused
		MVT WideVT = VT.changeVectorElementType(MVT::i8);
		SmallVector<SDValue, 4> WideOps;
		for (SDValue Op : N.getNode()->ops()) {
		assert(Op.getSimpleValueType() == VT &&
		"Operands and result must be same type");
		WideOps.push_back(DAG.getNode(ISD::ZERO_EXTEND, DL, WideVT, Op));
		}

		unsigned NumVals = N.getNode()->getNumValues();

		SDVTList VTs = DAG.getVTList(SmallVector<EVT, 4>(
		NumVals, N.getValueType().changeVectorElementType(MVT::i8)));
		SDValue WideN = DAG.getNode(N.getOpcode(), DL, VTs, WideOps);
		SmallVector<SDValue, 4> TruncVals;
		for (unsigned I = 0; I < WideN.getNode()->getNumValues(); I++) {
		reamesUnsubmitted Done Reply Inline Actions WideN.getNode()->getNumValues() should just be NumVals right? reames: WideN.getNode()->getNumValues() should just be NumVals right?
		TruncVals.push_back(
		DAG.getNode(ISD::TRUNCATE, DL, VT, SDValue(WideN.getNode(), I)));
		craig.topperUnsubmitted Done Reply Inline Actions `operator->` works on SDValue. You don't need to call getNode. craig.topper: `operator->` works on SDValue. You don't need to call getNode.
		}

		if (TruncVals.size() > 1)
		return DAG.getMergeValues(TruncVals, DL);
		return TruncVals.front();
		}

		SDValue RISCVTargetLowering::lowerVECTOR_DEINTERLEAVE(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		MVT VecVT = Op.getSimpleValueType();
		MVT XLenVT = Subtarget.getXLenVT();

		// 1 bit element vectors need to be widened to e8
		if (VecVT.getVectorElementType() == MVT::i1)
		return wideVectorOpToi8(Op, DL, DAG);

		// Reconstruct the concatenated array to deinterleave
		MVT WideVT = MVT::getScalableVectorVT(VecVT.getVectorElementType(),
		VecVT.getVectorMinNumElements() * 2);
		SDValue Wide = DAG.getNode(ISD::CONCAT_VECTORS, DL, WideVT, Op.getOperand(0),
		Op.getOperand(1));

		auto [Mask, VL] = getDefaultScalableVLOps(WideVT, DL, DAG, Subtarget);

		// If the element type is smaller than i64, then we can deinterleave
		// through vnsrl.wi
		if (VecVT.getScalarSizeInBits() < 64) {
		// Bitcast the concatenated vector from <n x m x ty> -> <n x m / 2 x ty * 2>
		// This is also casts FPs to ints
		reamesUnsubmitted Done Reply Inline Actions Please use getDefaultVLOps here. reames: Please use getDefaultVLOps here.
		MVT WideEltVT = MVT::getIntegerVT(WideVT.getScalarSizeInBits() * 2);
		WideVT = MVT::getVectorVT(
		WideEltVT, WideVT.getVectorElementCount().divideCoefficientBy(2));
		reamesUnsubmitted Done Reply Inline Actions I strongly request we drop the optimized case from the initial patch and come back to these in follow up changes. I want to focus on having a correct base lowering first. reames: I strongly request we drop the optimized case from the initial patch and come back to these in…
		reamesUnsubmitted Done Reply Inline Actions Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop this request. Instead, I'm going to explore a pre-change to make sharing that logic possible. reames: Staring at this code and the analogous code in fixed length vector lowering, I'm going to drop…
		Wide = DAG.getBitcast(WideVT, Wide);

		MVT NarrowVT = VecVT.changeVectorElementTypeToInteger();
		auto [Mask, VL] = getDefaultScalableVLOps(NarrowVT, DL, DAG, Subtarget);
		reamesUnsubmitted Done Reply Inline Actions This case looks to be missing from the fixed length lowering, we should find a way to pick it up there as well. reames: This case looks to be missing from the fixed length lowering, we should find a way to pick it…
		lukeAuthorUnsubmitted Done Reply Inline Actions I think this is handled in `lowerVECTOR_SHUFFLEAsVNSRL` luke: I think this is handled in `lowerVECTOR_SHUFFLEAsVNSRL`

		SDValue Passthru = DAG.getUNDEF(VecVT);

		SDValue Even = DAG.getNode(
		RISCVISD::VNSRL_VL, DL, NarrowVT, Wide,
		DAG.getSplatVector(NarrowVT, DL, DAG.getConstant(0, DL, XLenVT)),
		Passthru, Mask, VL);
		SDValue Odd = DAG.getNode(
		RISCVISD::VNSRL_VL, DL, NarrowVT, Wide,
		DAG.getSplatVector(
		NarrowVT, DL,
		DAG.getConstant(VecVT.getScalarSizeInBits(), DL, XLenVT)),
		Passthru, Mask, VL);

		// Bitcast the results back in case it was casted from an FP vector
		return DAG.getMergeValues(
		{DAG.getBitcast(VecVT, Even), DAG.getBitcast(VecVT, Odd)}, DL);
		}

		MVT IdxVT = WideVT.changeVectorElementTypeToInteger();
		// Create a vector of even indices {0, 2, 4, ...}
		SDValue EvenIdx =
		DAG.getStepVector(DL, IdxVT, APInt(WideVT.getScalarSizeInBits(), 2));
		// Create a vector of odd indices {1, 3, 5, ... }
		SDValue OddIdx =
		DAG.getNode(ISD::ADD, DL, IdxVT, EvenIdx, DAG.getConstant(1, DL, IdxVT));
		reamesUnsubmitted Done Reply Inline Actions The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine, we need the index type to be larger than i8. reames: The index type here doesn't look right. Say we have two vectors of i8. On a VLEN>2048 machine…
		reamesUnsubmitted Done Reply Inline Actions Realized the case I raised is unreachable until the fast path above is removed (as I suggested). reames: Realized the case I raised is unreachable until the fast path above is removed (as I suggested).
		lukeAuthorUnsubmitted Done Reply Inline Actions Come think of it this could just be replaced with `i16` and `VRGATHEREI16` instead, saving some space. luke: Come think of it this could just be replaced with `i16` and `VRGATHEREI16` instead, saving some…
		lukeAuthorUnsubmitted Done Reply Inline Actions Never mind, this causes an extra `vsetvli` luke: Never mind, this causes an extra `vsetvli`

		// Gather the even and odd elements into two separate vectors
		SDValue EvenWide = DAG.getNode(RISCVISD::VRGATHER_VV_VL, DL, WideVT, Wide,
		EvenIdx, DAG.getUNDEF(WideVT), Mask, VL);
		SDValue OddWide = DAG.getNode(RISCVISD::VRGATHER_VV_VL, DL, WideVT, Wide,
		OddIdx, DAG.getUNDEF(WideVT), Mask, VL);

		lukeAuthorUnsubmitted Done Reply Inline Actions I'm not that happy with how this mask is generated. Ideally we would just `vmv.v.i v0, 0x55` directly, but I can't seem to do it in a way that keeps SelectionDAG happy. Namely, it doesn't allow bitcasting `<vscale x n x i8>` -> `<vscale x n i1>` luke: I'm not that happy with how this mask is generated. Ideally we would just `vmv.v.i v0, 0x55`…
		SDValue Even = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VecVT, EvenWide,
		DAG.getConstant(0, DL, XLenVT));
		SDValue Odd = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VecVT, OddWide,
		DAG.getConstant(0, DL, XLenVT));

		return DAG.getMergeValues({Even, Odd}, DL);
		}

		SDValue RISCVTargetLowering::lowerVECTOR_INTERLEAVE(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		MVT VecVT = Op.getSimpleValueType();

		// i1 vectors need to be widened to i8
		if (VecVT.getVectorElementType() == MVT::i1)
		return wideVectorOpToi8(Op, DL, DAG);

		MVT XLenVT = Subtarget.getXLenVT();
		auto [_, VL] = getDefaultScalableVLOps(VecVT, DL, DAG, Subtarget);

		MVT ConcatVT =
		MVT::getVectorVT(VecVT.getVectorElementType(),
		VecVT.getVectorElementCount().multiplyCoefficientBy(2));
		SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, DL, ConcatVT,
		Op.getOperand(0), Op.getOperand(1));

		MVT IdxVT = MVT::getVectorVT(MVT::i16, ConcatVT.getVectorElementCount());

		SDValue StepVec = DAG.getStepVector(DL, IdxVT);

		SDValue One = DAG.getConstant(1, DL, XLenVT);
		reamesUnsubmitted Done Reply Inline Actions I think this is the same as WideVT above, but with different naming and computation. Can you standardize on one or the other please? reames: I think this is the same as WideVT above, but with different naming and computation. Can you…

		// ... 0 1 0 1 0 1 0 1
		SDValue OddMask = DAG.getNode(ISD::AND, DL, IdxVT, StepVec,
		DAG.getSplatVector(IdxVT, DL, One));
		// Convert it to a mask vector type (nxmxi16 -> nxmxi1)
		// vmsne.vi v0, oddmask, 0
		reamesUnsubmitted Done Reply Inline Actions Please drop the optimized case from the initial patch. reames: Please drop the optimized case from the initial patch.
		reamesUnsubmitted Done Reply Inline Actions Same here. reames: Same here.
		OddMask = DAG.getSetCC(
		DL, getMaskTypeFor(ConcatVT), OddMask,
		DAG.getSplatVector(IdxVT, DL, DAG.getConstant(0, DL, XLenVT)),
		ISD::CondCode::SETNE);
		reamesUnsubmitted Done Reply Inline Actions This bit of code should be shared with the isInterleaveShuffle above. This will require a bit of restructuring. reames: This bit of code should be shared with the isInterleaveShuffle above. This will require a bit…

		SDValue VLMAX =
		DAG.getNode(ISD::VSCALE, DL, XLenVT,
		getVLOp(VecVT.getVectorMinNumElements(), DL, DAG, Subtarget));

		// Build up the index vector for interleaving the concatenated array
		reamesUnsubmitted Done Reply Inline Actions See changeVectorElementType reames: See changeVectorElementType
		// ... 3 3 2 2 1 1 0 0
		SDValue Idx = DAG.getNode(ISD::SRL, DL, IdxVT, StepVec,
		DAG.getSplatVector(IdxVT, DL, One));
		// ... n+3 3 n+2 2 n+1 1 n 0
		Idx = DAG.getNode(RISCVISD::ADD_VL, DL, IdxVT, Idx,
		DAG.getSplatVector(IdxVT, DL, VLMAX), DAG.getUNDEF(IdxVT),
		OddMask, VL);

		// Perform the interleaving
		SDValue Interleaved =
		DAG.getNode(RISCVISD::VRGATHEREI16_VV_VL, DL, ConcatVT, Concat, Idx,
		DAG.getUNDEF(ConcatVT), OddMask, VL);

		// Extract the two halves from the concatenated result
		reamesUnsubmitted Done Reply Inline Actions See newly added computeVLMax. reames: See newly added computeVLMax.
		lukeAuthorUnsubmitted Done Reply Inline Actions It looks like `computeVLMAX` computes the maximum VLMAX statically, not the actual VLMAX on the current hardware. Although there's a bunch of other places where VLMAX is computed. Maybe we can rename `computeVLMAX` to `computeMaxVLMAX`, and then add a helper function `getVLMAX` luke: It looks like `computeVLMAX` computes the maximum VLMAX statically, not the actual VLMAX on the…
		craig.topperUnsubmitted Done Reply Inline Actions It's using ISD::VSCALE which will be expanded to something like `csrr a0, vlenb; srli a0, 3`. We define "vscale" as vlen/64. vlenb is already in bytes so we divide by with a shift. craig.topper: It's using ISD::VSCALE which will be expanded to something like `csrr a0, vlenb; srli a0, 3`.
		craig.topperUnsubmitted Done Reply Inline Actions Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we divide by another 8 with a shift right by 3." craig.topper: Oops that should have said "We define "vscale" as vlen/64. vlenb is already in bytes so we…
		lukeAuthorUnsubmitted Done Reply Inline Actions My bad, I didn't realise how new it actually was and hadn't pulled in 9168c98553ac9a1f8e8b87006f9b1b3f23955beb, I thought you were referring to `RISCVTargetLowering::computeVLMAX` luke: My bad, I didn't realise how new it actually was and hadn't pulled in…
		SDValue Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VecVT, Interleaved,
		DAG.getVectorIdxConstant(0, DL));
		SDValue Hi = DAG.getNode(
		ISD::EXTRACT_SUBVECTOR, DL, VecVT, Interleaved,
		DAG.getVectorIdxConstant(VecVT.getVectorMinNumElements(), DL));
		reamesUnsubmitted Done Reply Inline Actions I think the comments are backwards here. I believe the value you're actually computing is: // 0,0,1,1,2,2,.. etc. This could simply be a "which order do we write vector lanes in" confusion, but this ordered doesn't match the deinterleave comment above either. reames: I think the comments are backwards here. I believe the value you're actually computing is: //…
		lukeAuthorUnsubmitted Done Reply Inline Actions I've been staring at the spec too long, they use backwards notation. I agree, keeping it in LLVM order makes more sense luke: I've been staring at the spec too long, they use backwards notation. I agree, keeping it in…

		return DAG.getMergeValues({Lo, Hi}, DL);
		}
		reamesUnsubmitted Done Reply Inline Actions Same here reames: Same here

// Lower step_vector to the vid instruction. Any non-identity step value must		// Lower step_vector to the vid instruction. Any non-identity step value must
// be accounted for my manual expansion.		// be accounted for my manual expansion.
		craig.topperUnsubmitted Done Reply Inline Actions Need to pass `Idx` instead of `UNDEF` to get a mask undisturbed vadd.vv. craig.topper: Need to pass `Idx` instead of `UNDEF` to get a mask undisturbed vadd.vv.
SDValue RISCVTargetLowering::lowerSTEP_VECTOR(SDValue Op,		SDValue RISCVTargetLowering::lowerSTEP_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
assert(VT.isScalableVector() && "Expected scalable vector");		assert(VT.isScalableVector() && "Expected scalable vector");
MVT XLenVT = Subtarget.getXLenVT();		MVT XLenVT = Subtarget.getXLenVT();
auto [Mask, VL] = getDefaultScalableVLOps(VT, DL, DAG, Subtarget);		auto [Mask, VL] = getDefaultScalableVLOps(VT, DL, DAG, Subtarget);
SDValue StepVec = DAG.getNode(RISCVISD::VID_VL, DL, VT, Mask, VL);		SDValue StepVec = DAG.getNode(RISCVISD::VID_VL, DL, VT, Mask, VL);
▲ Show 20 Lines • Show All 7,815 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s
				craig.topperUnsubmitted Done Reply Inline Actions I don't think this test even uses your patch and shows the strategy for how your patch can be improved. Only i64 and double test case use vrgather. craig.topper: I don't think this test even uses your patch and shows the strategy for how your patch can be…
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s

				; Integers

				define {<16 x i1>, <16 x i1>} @vector_deinterleave_v16i1_v32i1(<32 x i1> %vec) {
				%retval = call {<16 x i1>, <16 x i1>} @llvm.experimental.vector.deinterleave2.v32i1(<32 x i1> %vec)
				ret {<16 x i1>, <16 x i1>} %retval
				craig.topperUnsubmitted Done Reply Inline Actions Missing CHECK lines craig.topper: Missing CHECK lines
				}

				define {<16 x i8>, <16 x i8>} @vector_deinterleave_v16i8_v32i8(<32 x i8> %vec) {
				; CHECK-LABEL: vector_deinterleave_v16i8_v32i8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 8
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<16 x i8>, <16 x i8>} @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8> %vec)
				ret {<16 x i8>, <16 x i8>} %retval
				}

				define {<8 x i16>, <8 x i16>} @vector_deinterleave_v8i16_v16i16(<16 x i16> %vec) {
				; CHECK-LABEL: vector_deinterleave_v8i16_v16i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<8 x i16>, <8 x i16>} @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16> %vec)
				ret {<8 x i16>, <8 x i16>} %retval
				}

				define {<4 x i32>, <4 x i32>} @vector_deinterleave_v4i32_vv8i32(<8 x i32> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4i32_vv8i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vnsrl.wx v10, v8, a0
				; CHECK-NEXT: vnsrl.wi v11, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v11
				; CHECK-NEXT: vmv.v.v v9, v10
				; CHECK-NEXT: ret
				%retval = call {<4 x i32>, <4 x i32>} @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32> %vec)
				ret {<4 x i32>, <4 x i32>} %retval
				}

				define {<2 x i64>, <2 x i64>} @vector_deinterleave_v2i64_v4i64(<4 x i64> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2i64_v4i64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e64, m2, ta, ma
				; CHECK-NEXT: vslidedown.vi v12, v8, 2
				; CHECK-NEXT: li a0, 2
				; CHECK-NEXT: vmv.s.x v0, a0
				; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, mu
				; CHECK-NEXT: vrgather.vi v10, v8, 0
				; CHECK-NEXT: vrgather.vi v10, v12, 0, v0.t
				; CHECK-NEXT: vrgather.vi v11, v8, 1
				; CHECK-NEXT: vrgather.vi v11, v12, 1, v0.t
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<2 x i64>, <2 x i64>} @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> %vec)
				ret {<2 x i64>, <2 x i64>} %retval
				}

				declare {<16 x i1>, <16 x i1>} @llvm.experimental.vector.deinterleave2.v32i1(<32 x i1>)
				declare {<16 x i8>, <16 x i8>} @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8>)
				declare {<8 x i16>, <8 x i16>} @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16>)
				declare {<4 x i32>, <4 x i32>} @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32>)
				declare {<2 x i64>, <2 x i64>} @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64>)

				; Floats

				define {<2 x half>, <2 x half>} @vector_deinterleave_v2f16_v4f16(<4 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f16_v4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v9, v8, 16
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%retval = call {<2 x half>, <2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %vec)
				ret {<2 x half>, <2 x half>} %retval
				}

				define {<4 x half>, <4 x half>} @vector_deinterleave_v4f16_v8f16(<8 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4f16_v8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v9, v8, 16
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%retval = call {<4 x half>, <4 x half>} @llvm.experimental.vector.deinterleave2.v8f16(<8 x half> %vec)
				ret {<4 x half>, <4 x half>} %retval
				}

				define {<2 x float>, <2 x float>} @vector_deinterleave_v2f32_v4f32(<4 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f32_v4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
				; CHECK-NEXT: vnsrl.wx v9, v8, a0
				; CHECK-NEXT: vnsrl.wi v8, v8, 0
				; CHECK-NEXT: ret
				%retval = call {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32(<4 x float> %vec)
				ret {<2 x float>, <2 x float>} %retval
				}

				define {<8 x half>, <8 x half>} @vector_deinterleave_v8f16_v16f16(<16 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v8f16_v16f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half> %vec)
				ret {<8 x half>, <8 x half>} %retval
				}

				define {<4 x float>, <4 x float>} @vector_deinterleave_v4f32_v8f32(<8 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4f32_v8f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vnsrl.wx v10, v8, a0
				; CHECK-NEXT: vnsrl.wi v11, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v11
				; CHECK-NEXT: vmv.v.v v9, v10
				; CHECK-NEXT: ret
				%retval = call {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float> %vec)
				ret {<4 x float>, <4 x float>} %retval
				}

				define {<2 x double>, <2 x double>} @vector_deinterleave_v2f64_v4f64(<4 x double> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f64_v4f64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 2, e64, m2, ta, ma
				; CHECK-NEXT: vslidedown.vi v12, v8, 2
				; CHECK-NEXT: li a0, 2
				; CHECK-NEXT: vmv.s.x v0, a0
				; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, mu
				; CHECK-NEXT: vrgather.vi v10, v8, 0
				; CHECK-NEXT: vrgather.vi v10, v12, 0, v0.t
				; CHECK-NEXT: vrgather.vi v11, v8, 1
				; CHECK-NEXT: vrgather.vi v11, v12, 1, v0.t
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> %vec)
				ret {<2 x double>, <2 x double>} %retval
				}

				declare {<2 x half>,<2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half>)
				declare {<4 x half>, <4 x half>} @llvm.experimental.vector.deinterleave2.v8f16(<8 x half>)
				declare {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32(<4 x float>)
				declare {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half>)
				declare {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float>)
				declare {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double>)

llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s

				; Integers

				define {<vscale x 16 x i1>, <vscale x 16 x i1>} @vector_deinterleave_nxv16i1_nxv32i1(<vscale x 32 x i1> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv16i1_nxv32i1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v8, v0
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e8, mf2, ta, ma
				; CHECK-NEXT: vslidedown.vx v0, v0, a0
				; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v10, 0
				; CHECK-NEXT: vmerge.vim v14, v10, 1, v0
				; CHECK-NEXT: vmv1r.v v0, v8
				; CHECK-NEXT: vmerge.vim v12, v10, 1, v0
				; CHECK-NEXT: vnsrl.wi v8, v12, 0
				; CHECK-NEXT: vand.vi v8, v8, 1
				; CHECK-NEXT: vmsne.vi v0, v8, 0
				; CHECK-NEXT: vnsrl.wi v8, v12, 8
				; CHECK-NEXT: vand.vi v10, v8, 1
				; CHECK-NEXT: vmsne.vi v8, v10, 0
				; CHECK-NEXT: ret
				%retval = call {<vscale x 16 x i1>, <vscale x 16 x i1>} @llvm.experimental.vector.deinterleave2.nxv32i1(<vscale x 32 x i1> %vec)
				ret {<vscale x 16 x i1>, <vscale x 16 x i1>} %retval
				}

				define {<vscale x 16 x i8>, <vscale x 16 x i8>} @vector_deinterleave_nxv16i8_nxv32i8(<vscale x 32 x i8> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv16i8_nxv32i8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
				; CHECK-NEXT: vnsrl.wi v12, v8, 0
				; CHECK-NEXT: vnsrl.wi v14, v8, 8
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: vmv.v.v v10, v14
				; CHECK-NEXT: ret
				%retval = call {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %vec)
				ret {<vscale x 16 x i8>, <vscale x 16 x i8>} %retval
				}

				define {<vscale x 8 x i16>, <vscale x 8 x i16>} @vector_deinterleave_nxv8i16_nxv16i16(<vscale x 16 x i16> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8i16_nxv16i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
				; CHECK-NEXT: vnsrl.wi v12, v8, 0
				; CHECK-NEXT: vnsrl.wi v14, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: vmv.v.v v10, v14
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16> %vec)
				ret {<vscale x 8 x i16>, <vscale x 8 x i16>} %retval
				}

				define {<vscale x 4 x i32>, <vscale x 4 x i32>} @vector_deinterleave_nxv4i32_nxvv8i32(<vscale x 8 x i32> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4i32_nxvv8i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetvli a1, zero, e32, m2, ta, ma
				; CHECK-NEXT: vnsrl.wx v12, v8, a0
				; CHECK-NEXT: vnsrl.wi v14, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v14
				; CHECK-NEXT: vmv.v.v v10, v12
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> %vec)
				ret {<vscale x 4 x i32>, <vscale x 4 x i32>} %retval
				}

				define {<vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv4i64(<vscale x 4 x i64> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e64, m4, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vadd.vv v16, v12, v12
				; CHECK-NEXT: vrgather.vv v12, v8, v16
				; CHECK-NEXT: vadd.vi v16, v16, 1
				; CHECK-NEXT: vrgather.vv v20, v8, v16
				; CHECK-NEXT: vmv2r.v v8, v12
				; CHECK-NEXT: vmv2r.v v10, v20
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64> %vec)
				ret {<vscale x 2 x i64>, <vscale x 2 x i64>} %retval
				}

				declare {<vscale x 16 x i1>, <vscale x 16 x i1>} @llvm.experimental.vector.deinterleave2.nxv32i1(<vscale x 32 x i1>)
				declare {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8>)
				declare {<vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16>)
				declare {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32>)
				declare {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64>)

				; Floats

				define {<vscale x 2 x half>, <vscale x 2 x half>} @vector_deinterleave_nxv2f16_nxv4f16(<vscale x 4 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f16_nxv4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, mf2, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v9, v8, 16
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x half>, <vscale x 2 x half>} @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %vec)
				ret {<vscale x 2 x half>, <vscale x 2 x half>} %retval
				}

				define {<vscale x 4 x half>, <vscale x 4 x half>} @vector_deinterleave_nxv4f16_nxv8f16(<vscale x 8 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4f16_nxv8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, ma
				; CHECK-NEXT: vnsrl.wi v10, v8, 0
				; CHECK-NEXT: vnsrl.wi v11, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: vmv.v.v v9, v11
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %vec)
				ret {<vscale x 4 x half>, <vscale x 4 x half>} %retval
				}

				define {<vscale x 2 x float>, <vscale x 2 x float>} @vector_deinterleave_nxv2f32_nxv4f32(<vscale x 4 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f32_nxv4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetvli a1, zero, e32, m1, ta, ma
				; CHECK-NEXT: vnsrl.wx v10, v8, a0
				; CHECK-NEXT: vnsrl.wi v11, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v11
				; CHECK-NEXT: vmv.v.v v9, v10
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %vec)
				ret {<vscale x 2 x float>, <vscale x 2 x float>} %retval
				}

				define {<vscale x 8 x half>, <vscale x 8 x half>} @vector_deinterleave_nxv8f16_nxv16f16(<vscale x 16 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8f16_nxv16f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
				; CHECK-NEXT: vnsrl.wi v12, v8, 0
				; CHECK-NEXT: vnsrl.wi v14, v8, 16
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: vmv.v.v v10, v14
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %vec)
				ret {<vscale x 8 x half>, <vscale x 8 x half>} %retval
				}

				define {<vscale x 4 x float>, <vscale x 4 x float>} @vector_deinterleave_nxv4f32_nxv8f32(<vscale x 8 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4f32_nxv8f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: li a0, 32
				; CHECK-NEXT: vsetvli a1, zero, e32, m2, ta, ma
				; CHECK-NEXT: vnsrl.wx v12, v8, a0
				; CHECK-NEXT: vnsrl.wi v14, v8, 0
				; CHECK-NEXT: vmv.v.v v8, v14
				; CHECK-NEXT: vmv.v.v v10, v12
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %vec)
				ret {<vscale x 4 x float>, <vscale x 4 x float>} %retval
				}

				define {<vscale x 2 x double>, <vscale x 2 x double>} @vector_deinterleave_nxv2f64_nxv4f64(<vscale x 4 x double> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f64_nxv4f64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a0, zero, e64, m4, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vadd.vv v16, v12, v12
				; CHECK-NEXT: vrgather.vv v12, v8, v16
				; CHECK-NEXT: vadd.vi v16, v16, 1
				; CHECK-NEXT: vrgather.vv v20, v8, v16
				; CHECK-NEXT: vmv2r.v v8, v12
				; CHECK-NEXT: vmv2r.v v10, v20
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %vec)
				ret {<vscale x 2 x double>, <vscale x 2 x double>} %retval
				}

				declare {<vscale x 2 x half>,<vscale x 2 x half>} @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
				declare {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
				declare {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float>)
				declare {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half>)
				declare {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
				declare {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)

llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s

				; Integers

				define <32 x i1> @vector_interleave_v32i1_v16i1(<16 x i1> %a, <16 x i1> %b) {
				%res = call <32 x i1> @llvm.experimental.vector.interleave2.v32i1(<16 x i1> %a, <16 x i1> %b)
				ret <32 x i1> %res
				}

				define <16 x i16> @vector_interleave_v16i16_v8i16(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: vector_interleave_v16i16_v8i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v10, v9
				; CHECK-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; CHECK-NEXT: vsetivli zero, 16, e16, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v12, 0
				; CHECK-NEXT: vsetivli zero, 8, e16, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v8, 0
				; CHECK-NEXT: vsetivli zero, 16, e16, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v10, 8
				; CHECK-NEXT: lui a0, %hi(.LCPI1_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI1_0)
				; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
				; CHECK-NEXT: vle16.v v10, (a0)
				; CHECK-NEXT: vrgather.vv v8, v12, v10
				; CHECK-NEXT: ret
				%res = call <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16> %a, <8 x i16> %b)
				ret <16 x i16> %res
				}

				define <8 x i32> @vector_interleave_v8i32_v4i32(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LABEL: vector_interleave_v8i32_v4i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v10, v9
				; CHECK-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v12, 0
				; CHECK-NEXT: vsetivli zero, 4, e32, m2, tu, ma
				reamesUnsubmitted Done Reply Inline Actions Missing check lines here, probably due to conflict with autogen. reames: Missing check lines here, probably due to conflict with autogen.
				; CHECK-NEXT: vslideup.vi v12, v8, 0
				; CHECK-NEXT: vsetivli zero, 8, e32, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v10, 4
				; CHECK-NEXT: lui a0, %hi(.LCPI2_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI2_0)
				; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
				; CHECK-NEXT: vle32.v v10, (a0)
				; CHECK-NEXT: vrgather.vv v8, v12, v10
				; CHECK-NEXT: ret
				%res = call <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32> %a, <4 x i32> %b)
				ret <8 x i32> %res
				}

				define <4 x i64> @vector_interleave_v4i64_v2i64(<2 x i64> %a, <2 x i64> %b) {
				%res = call <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64> %a, <2 x i64> %b)
				ret <4 x i64> %res
				}

				declare <32 x i1> @llvm.experimental.vector.interleave2.v32i1(<16 x i1>, <16 x i1>)
				declare <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16>, <8 x i16>)
				declare <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32>, <4 x i32>)
				declare <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64>, <2 x i64>)

				; Floats

				define <4 x half> @vector_interleave_v4f16_v2f16(<2 x half> %a, <2 x half> %b) {
				; CHECK-LABEL: vector_interleave_v4f16_v2f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
				; CHECK-NEXT: vmv.v.i v10, 0
				; CHECK-NEXT: vsetivli zero, 2, e16, mf2, tu, ma
				; CHECK-NEXT: vslideup.vi v10, v8, 0
				; CHECK-NEXT: vsetivli zero, 4, e16, mf2, tu, ma
				; CHECK-NEXT: vslideup.vi v10, v9, 2
				; CHECK-NEXT: lui a0, %hi(.LCPI4_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI4_0)
				; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
				; CHECK-NEXT: vle16.v v9, (a0)
				; CHECK-NEXT: vrgather.vv v8, v10, v9
				; CHECK-NEXT: ret
				%res = call <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half> %a, <2 x half> %b)
				ret <4 x half> %res
				}

				define <8 x half> @vector_interleave_v8f16_v4f16(<4 x half> %a, <4 x half> %b) {
				; CHECK-LABEL: vector_interleave_v8f16_v4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
				; CHECK-NEXT: vmv.v.i v10, 0
				; CHECK-NEXT: vsetivli zero, 4, e16, m1, tu, ma
				; CHECK-NEXT: vslideup.vi v10, v8, 0
				; CHECK-NEXT: vsetivli zero, 8, e16, m1, tu, ma
				; CHECK-NEXT: vslideup.vi v10, v9, 4
				; CHECK-NEXT: lui a0, %hi(.LCPI5_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI5_0)
				; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
				; CHECK-NEXT: vle16.v v9, (a0)
				; CHECK-NEXT: vrgather.vv v8, v10, v9
				; CHECK-NEXT: ret
				%res = call <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half> %a, <4 x half> %b)
				ret <8 x half> %res
				}

				define <4 x float> @vector_interleave_v4f32_v2f32(<2 x float> %a, <2 x float> %b) {
				; CHECK-LABEL: vector_interleave_v4f32_v2f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
				; CHECK-NEXT: vmv.v.i v10, 0
				; CHECK-NEXT: vsetivli zero, 2, e32, m1, tu, ma
				; CHECK-NEXT: vslideup.vi v10, v8, 0
				; CHECK-NEXT: vsetivli zero, 4, e32, m1, tu, ma
				; CHECK-NEXT: vslideup.vi v10, v9, 2
				; CHECK-NEXT: lui a0, %hi(.LCPI6_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI6_0)
				; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
				; CHECK-NEXT: vle32.v v9, (a0)
				; CHECK-NEXT: vrgather.vv v8, v10, v9
				reamesUnsubmitted Done Reply Inline Actions Same problem here. reames: Same problem here.
				; CHECK-NEXT: ret
				%res = call <4 x float> @llvm.experimental.vector.interleave2.v4f32(<2 x float> %a, <2 x float> %b)
				ret <4 x float> %res
				}

				define <16 x half> @vector_interleave_v16f16_v8f16(<8 x half> %a, <8 x half> %b) {
				; CHECK-LABEL: vector_interleave_v16f16_v8f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v10, v9
				; CHECK-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; CHECK-NEXT: vsetivli zero, 16, e16, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v12, 0
				; CHECK-NEXT: vsetivli zero, 8, e16, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v8, 0
				; CHECK-NEXT: vsetivli zero, 16, e16, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v10, 8
				; CHECK-NEXT: lui a0, %hi(.LCPI7_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI7_0)
				; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
				; CHECK-NEXT: vle16.v v10, (a0)
				; CHECK-NEXT: vrgather.vv v8, v12, v10
				; CHECK-NEXT: ret
				%res = call <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half> %a, <8 x half> %b)
				ret <16 x half> %res
				}

				define <8 x float> @vector_interleave_v8f32_v4f32(<4 x float> %a, <4 x float> %b) {
				; CHECK-LABEL: vector_interleave_v8f32_v4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v10, v9
				; CHECK-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v12, 0
				; CHECK-NEXT: vsetivli zero, 4, e32, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v8, 0
				; CHECK-NEXT: vsetivli zero, 8, e32, m2, tu, ma
				; CHECK-NEXT: vslideup.vi v12, v10, 4
				; CHECK-NEXT: lui a0, %hi(.LCPI8_0)
				; CHECK-NEXT: addi a0, a0, %lo(.LCPI8_0)
				; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
				; CHECK-NEXT: vle32.v v10, (a0)
				; CHECK-NEXT: vrgather.vv v8, v12, v10
				; CHECK-NEXT: ret
				%res = call <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float> %a, <4 x float> %b)
				ret <8 x float> %res
				}

				define <4 x double> @vector_interleave_v4f64_v2f64(<2 x double> %a, <2 x double> %b) {
				%res = call <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double> %a, <2 x double> %b)
				ret <4 x double> %res
				}


				declare <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half>, <2 x half>)
				declare <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half>, <4 x half>)
				declare <4 x float> @llvm.experimental.vector.interleave2.v4f32(<2 x float>, <2 x float>)
				declare <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half>, <8 x half>)
				declare <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float>, <4 x float>)
				declare <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double>, <2 x double>)

llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s
				; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+experimental-zvfh \| FileCheck %s

				; Integers

				define <vscale x 32 x i1> @vector_interleave_nxv32i1_nxv16i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) {
				; CHECK-LABEL: vector_interleave_nxv32i1_nxv16i1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmv1r.v v9, v0
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: slli a1, a0, 1
				; CHECK-NEXT: vsetvli a2, zero, e16, m8, ta, ma
				; CHECK-NEXT: vid.v v16
				; CHECK-NEXT: vand.vi v24, v16, 1
				; CHECK-NEXT: vmsne.vi v10, v24, 0
				; CHECK-NEXT: vsrl.vi v16, v16, 1
				; CHECK-NEXT: vmv1r.v v0, v10
				; CHECK-NEXT: vadd.vx v16, v16, a1, v0.t
				; CHECK-NEXT: vsetvli a1, zero, e8, m2, ta, ma
				; CHECK-NEXT: vmv.v.i v12, 0
				; CHECK-NEXT: vmv1r.v v0, v8
				; CHECK-NEXT: vmerge.vim v26, v12, 1, v0
				; CHECK-NEXT: vmv1r.v v0, v9
				; CHECK-NEXT: vmerge.vim v24, v12, 1, v0
				; CHECK-NEXT: vsetvli a1, zero, e8, m4, ta, ma
				; CHECK-NEXT: vmv1r.v v0, v10
				; CHECK-NEXT: vrgatherei16.vv v8, v24, v16, v0.t
				; CHECK-NEXT: vsetvli a1, zero, e8, m2, ta, ma
				; CHECK-NEXT: vand.vi v12, v10, 1
				; CHECK-NEXT: vmsne.vi v14, v12, 0
				; CHECK-NEXT: vand.vi v8, v8, 1
				; CHECK-NEXT: vmsne.vi v0, v8, 0
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: add a1, a0, a0
				; CHECK-NEXT: vsetvli zero, a1, e8, mf2, tu, ma
				; CHECK-NEXT: vslideup.vx v0, v14, a0
				; CHECK-NEXT: ret
				%res = call <vscale x 32 x i1> @llvm.experimental.vector.interleave2.nxv32i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b)
				ret <vscale x 32 x i1> %res
				}

				define <vscale x 16 x i16> @vector_interleave_nxv16i16_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
				%res = call <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 16 x i16> %res
				}

				define <vscale x 8 x i32> @vector_interleave_nxv8i32_nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: vector_interleave_nxv8i32_nxv4i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v10m2 killed $v10m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 1
				; CHECK-NEXT: vsetvli a1, zero, e16, m2, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vand.vi v14, v12, 1
				; CHECK-NEXT: vmsne.vi v0, v14, 0
				; CHECK-NEXT: vsrl.vi v12, v12, 1
				; CHECK-NEXT: vadd.vx v16, v12, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
				; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: vrgatherei16.vv v12, v8, v16, v0.t
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 8 x i32> %res
				}

				define <vscale x 4 x i64> @vector_interleave_nxv4i64_nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: vector_interleave_nxv4i64_nxv2i64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v10m2 killed $v10m2 killed $v8m4 def $v8m4
				craig.topperUnsubmitted Done Reply Inline Actions This is a mask agnostic vadd.vx. You need mask undisturbed. craig.topper: This is a mask agnostic vadd.vx. You need mask undisturbed.
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vand.vi v13, v12, 1
				; CHECK-NEXT: vmsne.vi v0, v13, 0
				; CHECK-NEXT: vsrl.vi v12, v12, 1
				; CHECK-NEXT: vadd.vx v16, v12, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
				; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: vrgatherei16.vv v12, v8, v16, v0.t
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 4 x i64> %res
				}

				declare <vscale x 32 x i1> @llvm.experimental.vector.interleave2.nxv32i1(<vscale x 16 x i1>, <vscale x 16 x i1>)
				declare <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64>, <vscale x 2 x i64>)

				; Floats

				define <vscale x 4 x half> @vector_interleave_nxv4f16_nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b) {
				; CHECK-LABEL: vector_interleave_nxv4f16_nxv2f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, ma
				; CHECK-NEXT: vid.v v10
				; CHECK-NEXT: vand.vi v11, v10, 1
				; CHECK-NEXT: vmsne.vi v0, v11, 0
				; CHECK-NEXT: vsrl.vi v10, v10, 1
				; CHECK-NEXT: vadd.vx v10, v10, a0, v0.t
				; CHECK-NEXT: add a1, a0, a0
				; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, ma
				; CHECK-NEXT: vslideup.vx v8, v9, a0
				; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, ma
				; CHECK-NEXT: vrgatherei16.vv v9, v8, v10, v0.t
				; CHECK-NEXT: vslidedown.vx v8, v9, a0
				; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, ma
				; CHECK-NEXT: vslideup.vx v9, v8, a0
				; CHECK-NEXT: vmv1r.v v8, v9
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b)
				ret <vscale x 4 x half> %res
				}

				define <vscale x 8 x half> @vector_interleave_nxv8f16_nxv4f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b) {
				; CHECK-LABEL: vector_interleave_nxv8f16_nxv4f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v9 killed $v9 killed $v8m2 def $v8m2
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 1
				; CHECK-NEXT: vsetvli a1, zero, e16, m2, ta, ma
				; CHECK-NEXT: vid.v v10
				; CHECK-NEXT: vand.vi v12, v10, 1
				; CHECK-NEXT: vmsne.vi v0, v12, 0
				; CHECK-NEXT: vsrl.vi v10, v10, 1
				; CHECK-NEXT: vadd.vx v12, v10, a0, v0.t
				; CHECK-NEXT: # kill: def $v8 killed $v8 killed $v8m2 def $v8m2
				; CHECK-NEXT: vrgatherei16.vv v10, v8, v12, v0.t
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half> %a, <vscale x 4 x half> %b)
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @vector_interleave_nxv4f32_nxv2f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b) {
				; CHECK-LABEL: vector_interleave_nxv4f32_nxv2f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v9 killed $v9 killed $v8m2 def $v8m2
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, ma
				; CHECK-NEXT: vid.v v10
				; CHECK-NEXT: vand.vi v11, v10, 1
				; CHECK-NEXT: vmsne.vi v0, v11, 0
				; CHECK-NEXT: vsrl.vi v10, v10, 1
				; CHECK-NEXT: vadd.vx v12, v10, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
				; CHECK-NEXT: # kill: def $v8 killed $v8 killed $v8m2 def $v8m2
				; CHECK-NEXT: vrgatherei16.vv v10, v8, v12, v0.t
				; CHECK-NEXT: vmv.v.v v8, v10
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float> %a, <vscale x 2 x float> %b)
				ret <vscale x 4 x float> %res
				}

				define <vscale x 16 x half> @vector_interleave_nxv16f16_nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				%res = call <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b)
				ret <vscale x 16 x half> %res
				}

				define <vscale x 8 x float> @vector_interleave_nxv8f32_nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: vector_interleave_nxv8f32_nxv4f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v10m2 killed $v10m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 1
				; CHECK-NEXT: vsetvli a1, zero, e16, m2, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vand.vi v14, v12, 1
				; CHECK-NEXT: vmsne.vi v0, v14, 0
				; CHECK-NEXT: vsrl.vi v12, v12, 1
				; CHECK-NEXT: vadd.vx v16, v12, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
				; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: vrgatherei16.vv v12, v8, v16, v0.t
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b)
				ret <vscale x 8 x float> %res
				}

				define <vscale x 4 x double> @vector_interleave_nxv4f64_nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: vector_interleave_nxv4f64_nxv2f64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $v10m2 killed $v10m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: vsetvli a1, zero, e16, m1, ta, ma
				; CHECK-NEXT: vid.v v12
				; CHECK-NEXT: vand.vi v13, v12, 1
				; CHECK-NEXT: vmsne.vi v0, v13, 0
				; CHECK-NEXT: vsrl.vi v12, v12, 1
				; CHECK-NEXT: vadd.vx v16, v12, a0, v0.t
				; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
				; CHECK-NEXT: # kill: def $v8m2 killed $v8m2 killed $v8m4 def $v8m4
				; CHECK-NEXT: vrgatherei16.vv v12, v8, v16, v0.t
				; CHECK-NEXT: vmv.v.v v8, v12
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b)
				ret <vscale x 4 x double> %res
				}


				declare <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half>, <vscale x 2 x half>)
				declare <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half>, <vscale x 4 x half>)
				declare <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float>, <vscale x 2 x float>)
				declare <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Lower interleave and deinterleave intrinsicsClosedPublic

Details

Diff Detail

Event Timeline