This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVISelLowering.h
3/3
RISCVISelLowering.cpp
-
RISCVInstrInfoVVLPatterns.td
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
fixed-vectors-fp-interleave.ll
-
fixed-vectors-int-interleave.ll
-
fixed-vectors-int-shuffles.ll

Differential D117743

[RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
ClosedPublic

Authored by craig.topper on Jan 19 2022, 8:40 PM.

Download Raw Diff

Details

Reviewers

frasercrmck
rogfer01
kito-cheng
khchen
arcbbb

Commits

rGfa8bb224661d: [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of…

Summary

RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Jan 19 2022, 8:40 PM

Herald added subscribers: VincentWu, luke957, achieveartificialintelligence and 24 others. · View Herald TranscriptJan 19 2022, 8:40 PM

craig.topper requested review of this revision.Jan 19 2022, 8:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2022, 8:40 PM

Herald added subscribers: eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B144471: Diff 401492.Jan 19 2022, 9:28 PM

LGTM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2348	I admit was very confused by this because I assumed the `Mask` of the `SDNode` would be the operand of the IR more or less verbatim (e.g. `0, 2, 1, 3`), but apparently it seems it is adjusted by the length of the mask itself (i.e. the elements of the second source are offset by the length of the concatenated vector, not sure if my interpretation is correct after reading the SelectionDAG code though).

This revision is now accepted and ready to land.Jan 20 2022, 8:50 AM

craig.topper added inline comments.Jan 20 2022, 9:01 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2347	consistenly -> consistently source -> "same source"
2348	IR allows the the sources and mask to have different lengths. SelectionDAG does not. There's a piece of code in SelectionDAGBuilder that tries a few heuristics for matching the lengths. I think it can fall back to a build_vector in the worst case.

Rebase. Fix comment.

This revision was landed with ongoing or failed builds.Jan 20 2022, 2:47 PM

Closed by commit rGfa8bb224661d: [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of… (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGfa8bb224661d: [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of….

Harbormaster completed remote builds in B144692: Diff 401785.Jan 20 2022, 4:43 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.h

1 line

RISCVISelLowering.cpp

118 lines

RISCVInstrInfoVVLPatterns.td

10 lines

test/

CodeGen/

RISCV/

rvv/

fixed-vectors-fp-interleave.ll

378 lines

fixed-vectors-int-interleave.ll

484 lines

fixed-vectors-int-shuffles.ll

15 lines

Diff 401785

llvm/lib/Target/RISCV/RISCVISelLowering.h

//===-- RISCVISelLowering.h - RISCV DAG Lowering Interface ------- C++ --===//		//===-- RISCVISelLowering.h - RISCV DAG Lowering Interface ------- C++ --===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
SINT_TO_FP_VL,		SINT_TO_FP_VL,
UINT_TO_FP_VL,		UINT_TO_FP_VL,
FP_ROUND_VL,		FP_ROUND_VL,
FP_EXTEND_VL,		FP_EXTEND_VL,

// Widening instructions		// Widening instructions
VWMUL_VL,		VWMUL_VL,
VWMULU_VL,		VWMULU_VL,
		VWADDU_VL,

// Vector compare producing a mask. Fourth operand is input mask. Fifth		// Vector compare producing a mask. Fourth operand is input mask. Fifth
// operand is VL.		// operand is VL.
SETCC_VL,		SETCC_VL,

// Vector select with an additional VL operand. This operation is unmasked.		// Vector select with an additional VL operand. This operation is unmasked.
VSELECT_VL,		VSELECT_VL,

▲ Show 20 Lines • Show All 438 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===-- RISCVISelLowering.cpp - RISCV DAG Lowering Implementation --------===//		//===-- RISCVISelLowering.cpp - RISCV DAG Lowering Implementation --------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 2,314 Lines • ▼ Show 20 Lines	static int matchShuffleAsSlideDown(ArrayRef<int> Mask) {
for (int Shift = 1; Shift != Size; ++Shift)		for (int Shift = 1; Shift != Size; ++Shift)
if (CheckUndefs(Shift) && MatchShift(Shift))		if (CheckUndefs(Shift) && MatchShift(Shift))
return Shift;		return Shift;

// No match.		// No match.
return -1;		return -1;
}		}

		static bool isInterleaveShuffle(ArrayRef<int> Mask, MVT VT, bool &SwapSources,
		const RISCVSubtarget &Subtarget) {
		// We need to widen elements to the next larger type.
		if (VT.getScalarSizeInBits() >= Subtarget.getMaxELENForFixedLengthVectors())
		return false;

		int Size = Mask.size();
		assert(Size == VT.getVectorNumElements() && "Unexpected mask size");

		int Srcs[] = {-1, -1};
		for (int i = 0; i != Size; ++i) {
		if (Mask[i] < 0)
		continue;

		// Is this an even or odd element.
		int Pol = i % 2;

		craig.topperAuthorUnsubmitted Done Reply Inline Actions consistenly -> consistently source -> "same source" craig.topper: consistenly -> consistently source -> "same source"
		// Ensure we consistently use the same source for this element polarity.
		rogfer01Unsubmitted Done Reply Inline Actions I admit was very confused by this because I assumed the `Mask` of the `SDNode` would be the operand of the IR more or less verbatim (e.g. `0, 2, 1, 3`), but apparently it seems it is adjusted by the length of the mask itself (i.e. the elements of the second source are offset by the length of the concatenated vector, not sure if my interpretation is correct after reading the SelectionDAG code though). rogfer01: I admit was very confused by this because I assumed the `Mask` of the `SDNode` would be the…
		craig.topperAuthorUnsubmitted Done Reply Inline Actions IR allows the the sources and mask to have different lengths. SelectionDAG does not. There's a piece of code in SelectionDAGBuilder that tries a few heuristics for matching the lengths. I think it can fall back to a build_vector in the worst case. craig.topper: IR allows the the sources and mask to have different lengths. SelectionDAG does not. There's a…
		int Src = Mask[i] / Size;
		if (Srcs[Pol] < 0)
		Srcs[Pol] = Src;
		if (Srcs[Pol] != Src)
		return false;

		// Make sure the element is appropriate for this lane.
		int Elt = Mask[i] % Size;
		if (Elt != i / 2)
		return false;
		}

		// We need to find 2 sources and they can't be the same.
		if (Srcs[0] < 0 \|\| Srcs[1] < 0 \|\| Srcs[0] == Srcs[1])
		return false;

		// Swap the sources if the second source came first.
		SwapSources = Srcs[0] > Srcs[1];

		return true;
		}

static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,		static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
SDValue V1 = Op.getOperand(0);		SDValue V1 = Op.getOperand(0);
SDValue V2 = Op.getOperand(1);		SDValue V2 = Op.getOperand(1);
SDLoc DL(Op);		SDLoc DL(Op);
MVT XLenVT = Subtarget.getXLenVT();		MVT XLenVT = Subtarget.getXLenVT();
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
unsigned NumElts = VT.getVectorNumElements();		unsigned NumElts = VT.getVectorNumElements();
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (Lane >= 0) {
assert(Lane < (int)NumElts && "Unexpected lane!");		assert(Lane < (int)NumElts && "Unexpected lane!");
SDValue Gather =		SDValue Gather =
DAG.getNode(RISCVISD::VRGATHER_VX_VL, DL, ContainerVT, V1,		DAG.getNode(RISCVISD::VRGATHER_VX_VL, DL, ContainerVT, V1,
DAG.getConstant(Lane, DL, XLenVT), TrueMask, VL);		DAG.getConstant(Lane, DL, XLenVT), TrueMask, VL);
return convertFromScalableVector(VT, Gather, DAG, Subtarget);		return convertFromScalableVector(VT, Gather, DAG, Subtarget);
}		}
}		}

		ArrayRef<int> Mask = SVN->getMask();

// Try to match as a slidedown.		// Try to match as a slidedown.
int SlideAmt = matchShuffleAsSlideDown(SVN->getMask());		int SlideAmt = matchShuffleAsSlideDown(Mask);
if (SlideAmt >= 0) {		if (SlideAmt >= 0) {
// TODO: Should we reduce the VL to account for the upper undef elements?		// TODO: Should we reduce the VL to account for the upper undef elements?
// Requires additional vsetvlis, but might be faster to execute.		// Requires additional vsetvlis, but might be faster to execute.
V1 = convertToScalableVector(ContainerVT, V1, DAG, Subtarget);		V1 = convertToScalableVector(ContainerVT, V1, DAG, Subtarget);
SDValue SlideDown =		SDValue SlideDown =
DAG.getNode(RISCVISD::VSLIDEDOWN_VL, DL, ContainerVT,		DAG.getNode(RISCVISD::VSLIDEDOWN_VL, DL, ContainerVT,
DAG.getUNDEF(ContainerVT), V1,		DAG.getUNDEF(ContainerVT), V1,
DAG.getConstant(SlideAmt, DL, XLenVT),		DAG.getConstant(SlideAmt, DL, XLenVT),
TrueMask, VL);		TrueMask, VL);
return convertFromScalableVector(VT, SlideDown, DAG, Subtarget);		return convertFromScalableVector(VT, SlideDown, DAG, Subtarget);
}		}

		// Detect an interleave shuffle and lower to
		// (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1))
		bool SwapSources;
		if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) {
		// Swap sources if we matched in the other order.
		if (SwapSources)
		std::swap(V1, V2);

		// Extract the lower half of the vectors.
		MVT HalfVT = VT.getHalfNumVectorElementsVT();
		V1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, HalfVT, V1,
		DAG.getConstant(0, DL, XLenVT));
		V2 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, HalfVT, V2,
		DAG.getConstant(0, DL, XLenVT));

		// Double the element width and halve the number of elements.
		unsigned EltBits = VT.getScalarSizeInBits();
		MVT WideIntEltVT = MVT::getIntegerVT(EltBits * 2);
		MVT WideIntVT =
		MVT::getVectorVT(WideIntEltVT, VT.getVectorNumElements() / 2);
		// Convert this to a scalable vector. We need to base this on the larger
		// size to ensure there's always a type with a smaller LMUL.
		MVT WideIntContainerVT =
		getContainerForFixedLengthVector(DAG, WideIntVT, Subtarget);

		// Convert sources to scalable vectors.
		MVT HalfContainerVT = MVT::getVectorVT(
		VT.getVectorElementType(), WideIntContainerVT.getVectorElementCount());
		V1 = convertToScalableVector(HalfContainerVT, V1, DAG, Subtarget);
		V2 = convertToScalableVector(HalfContainerVT, V2, DAG, Subtarget);

		// Cast sources to integer.
		MVT IntEltVT = MVT::getIntegerVT(EltBits);
		MVT IntHalfVT =
		MVT::getVectorVT(IntEltVT, HalfContainerVT.getVectorElementCount());
		V1 = DAG.getBitcast(IntHalfVT, V1);
		V2 = DAG.getBitcast(IntHalfVT, V2);

		// Freeze V2 since we use it twice and we need to be sure that the add and
		// multiply see the same value.
		V2 = DAG.getNode(ISD::FREEZE, DL, IntHalfVT, V2);

		// Recreate TrueMask using the correct scalable type.
		MVT MaskVT =
		MVT::getVectorVT(MVT::i1, HalfContainerVT.getVectorElementCount());
		TrueMask = DAG.getNode(RISCVISD::VMSET_VL, DL, MaskVT, VL);

		// Widen V1 and V2 with 0s and add one copy of V2 to V1.
		SDValue Add = DAG.getNode(RISCVISD::VWADDU_VL, DL, WideIntContainerVT, V1,
		V2, TrueMask, VL);
		// Create 2^eltbits - 1 copies of V2 by multiplying by the largest integer.
		SDValue Multiplier = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, IntHalfVT,
		DAG.getAllOnesConstant(DL, XLenVT));
		SDValue WidenMul = DAG.getNode(RISCVISD::VWMULU_VL, DL, WideIntContainerVT,
		V2, Multiplier, TrueMask, VL);
		// Add the new copies to our previous addition giving us 2^eltbits copies of
		// V2. This is equivalent to shifting V2 left by eltbits.
		Add = DAG.getNode(RISCVISD::ADD_VL, DL, WideIntContainerVT, Add, WidenMul,
		TrueMask, VL);
		// Cast back to ContainerVT. We need to re-create a new ContainerVT in case
		// WideIntContainerVT is a larger fractional LMUL than implied by the fixed
		// vector VT.
		ContainerVT =
		MVT::getVectorVT(VT.getVectorElementType(),
		WideIntContainerVT.getVectorElementCount() * 2);
		Add = DAG.getBitcast(ContainerVT, Add);
		return convertFromScalableVector(VT, Add, DAG, Subtarget);
		}

// Detect shuffles which can be re-expressed as vector selects; these are		// Detect shuffles which can be re-expressed as vector selects; these are
// shuffles in which each element in the destination is taken from an element		// shuffles in which each element in the destination is taken from an element
// at the corresponding index in either source vectors.		// at the corresponding index in either source vectors.
bool IsSelect = all_of(enumerate(SVN->getMask()), [&](const auto &MaskIdx) {		bool IsSelect = all_of(enumerate(Mask), [&](const auto &MaskIdx) {
int MaskIndex = MaskIdx.value();		int MaskIndex = MaskIdx.value();
return MaskIndex < 0 \|\| MaskIdx.index() == (unsigned)MaskIndex % NumElts;		return MaskIndex < 0 \|\| MaskIdx.index() == (unsigned)MaskIndex % NumElts;
});		});

assert(!V1.isUndef() && "Unexpected shuffle canonicalization");		assert(!V1.isUndef() && "Unexpected shuffle canonicalization");

SmallVector<SDValue> MaskVals;		SmallVector<SDValue> MaskVals;
// As a backup, shuffles can be lowered via a vrgather instruction, possibly		// As a backup, shuffles can be lowered via a vrgather instruction, possibly
Show All 9 Lines	static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,

// Keep a track of which non-undef indices are used by each LHS/RHS shuffle		// Keep a track of which non-undef indices are used by each LHS/RHS shuffle
// half.		// half.
DenseMap<int, unsigned> LHSIndexCounts, RHSIndexCounts;		DenseMap<int, unsigned> LHSIndexCounts, RHSIndexCounts;

// Now construct the mask that will be used by the vselect or blended		// Now construct the mask that will be used by the vselect or blended
// vrgather operation. For vrgathers, construct the appropriate indices into		// vrgather operation. For vrgathers, construct the appropriate indices into
// each vector.		// each vector.
for (int MaskIndex : SVN->getMask()) {		for (int MaskIndex : Mask) {
bool SelectMaskVal = (MaskIndex < (int)NumElts) ^ InvertMask;		bool SelectMaskVal = (MaskIndex < (int)NumElts) ^ InvertMask;
MaskVals.push_back(DAG.getConstant(SelectMaskVal, DL, XLenVT));		MaskVals.push_back(DAG.getConstant(SelectMaskVal, DL, XLenVT));
if (!IsSelect) {		if (!IsSelect) {
bool IsLHSOrUndefIndex = MaskIndex < (int)NumElts;		bool IsLHSOrUndefIndex = MaskIndex < (int)NumElts;
GatherIndicesLHS.push_back(IsLHSOrUndefIndex && MaskIndex >= 0		GatherIndicesLHS.push_back(IsLHSOrUndefIndex && MaskIndex >= 0
? DAG.getConstant(MaskIndex, DL, XLenVT)		? DAG.getConstant(MaskIndex, DL, XLenVT)
: DAG.getUNDEF(XLenVT));		: DAG.getUNDEF(XLenVT));
GatherIndicesRHS.push_back(		GatherIndicesRHS.push_back(
▲ Show 20 Lines • Show All 7,468 Lines • ▼ Show 20 Lines	#define NODE_NAME_CASE(NODE) \
NODE_NAME_CASE(FP_TO_SINT_VL)		NODE_NAME_CASE(FP_TO_SINT_VL)
NODE_NAME_CASE(FP_TO_UINT_VL)		NODE_NAME_CASE(FP_TO_UINT_VL)
NODE_NAME_CASE(SINT_TO_FP_VL)		NODE_NAME_CASE(SINT_TO_FP_VL)
NODE_NAME_CASE(UINT_TO_FP_VL)		NODE_NAME_CASE(UINT_TO_FP_VL)
NODE_NAME_CASE(FP_EXTEND_VL)		NODE_NAME_CASE(FP_EXTEND_VL)
NODE_NAME_CASE(FP_ROUND_VL)		NODE_NAME_CASE(FP_ROUND_VL)
NODE_NAME_CASE(VWMUL_VL)		NODE_NAME_CASE(VWMUL_VL)
NODE_NAME_CASE(VWMULU_VL)		NODE_NAME_CASE(VWMULU_VL)
		NODE_NAME_CASE(VWADDU_VL)
NODE_NAME_CASE(SETCC_VL)		NODE_NAME_CASE(SETCC_VL)
NODE_NAME_CASE(VSELECT_VL)		NODE_NAME_CASE(VSELECT_VL)
NODE_NAME_CASE(VMAND_VL)		NODE_NAME_CASE(VMAND_VL)
NODE_NAME_CASE(VMOR_VL)		NODE_NAME_CASE(VMOR_VL)
NODE_NAME_CASE(VMXOR_VL)		NODE_NAME_CASE(VMXOR_VL)
NODE_NAME_CASE(VMCLR_VL)		NODE_NAME_CASE(VMCLR_VL)
NODE_NAME_CASE(VMSET_VL)		NODE_NAME_CASE(VMSET_VL)
NODE_NAME_CASE(VRGATHER_VX_VL)		NODE_NAME_CASE(VRGATHER_VX_VL)
▲ Show 20 Lines • Show All 793 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines

def riscv_trunc_vector_vl : SDNode<"RISCVISD::TRUNCATE_VECTOR_VL",		def riscv_trunc_vector_vl : SDNode<"RISCVISD::TRUNCATE_VECTOR_VL",
SDTypeProfile<1, 3, [SDTCisVec<0>,		SDTypeProfile<1, 3, [SDTCisVec<0>,
SDTCisVec<1>,		SDTCisVec<1>,
SDTCisSameNumEltsAs<0, 2>,		SDTCisSameNumEltsAs<0, 2>,
SDTCVecEltisVT<2, i1>,		SDTCVecEltisVT<2, i1>,
SDTCisVT<3, XLenVT>]>>;		SDTCisVT<3, XLenVT>]>>;

def SDT_RISCVVWMUL_VL : SDTypeProfile<1, 4, [SDTCisVec<0>,		def SDT_RISCVVWBinOp_VL : SDTypeProfile<1, 4, [SDTCisVec<0>,
SDTCisSameNumEltsAs<0, 1>,		SDTCisSameNumEltsAs<0, 1>,
SDTCisSameAs<1, 2>,		SDTCisSameAs<1, 2>,
SDTCisSameNumEltsAs<1, 3>,		SDTCisSameNumEltsAs<1, 3>,
SDTCVecEltisVT<3, i1>,		SDTCVecEltisVT<3, i1>,
SDTCisVT<4, XLenVT>]>;		SDTCisVT<4, XLenVT>]>;
def riscv_vwmul_vl : SDNode<"RISCVISD::VWMUL_VL", SDT_RISCVVWMUL_VL, [SDNPCommutative]>;		def riscv_vwmul_vl : SDNode<"RISCVISD::VWMUL_VL", SDT_RISCVVWBinOp_VL, [SDNPCommutative]>;
def riscv_vwmulu_vl : SDNode<"RISCVISD::VWMULU_VL", SDT_RISCVVWMUL_VL, [SDNPCommutative]>;		def riscv_vwmulu_vl : SDNode<"RISCVISD::VWMULU_VL", SDT_RISCVVWBinOp_VL, [SDNPCommutative]>;
		def riscv_vwaddu_vl : SDNode<"RISCVISD::VWADDU_VL", SDT_RISCVVWBinOp_VL, [SDNPCommutative]>;

def SDTRVVVecReduce : SDTypeProfile<1, 5, [		def SDTRVVVecReduce : SDTypeProfile<1, 5, [
SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,		SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,
SDTCVecEltisVT<4, i1>, SDTCisSameNumEltsAs<2, 4>, SDTCisVT<5, XLenVT>		SDTCVecEltisVT<4, i1>, SDTCisSameNumEltsAs<2, 4>, SDTCisVT<5, XLenVT>
]>;		]>;

def riscv_mul_vl_oneuse : PatFrag<(ops node:$A, node:$B, node:$C, node:$D),		def riscv_mul_vl_oneuse : PatFrag<(ops node:$A, node:$B, node:$C, node:$D),
(riscv_mul_vl node:$A, node:$B, node:$C,		(riscv_mul_vl node:$A, node:$B, node:$C,
▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	foreach vti = AllIntegerVectors in {
def : Pat<(riscv_sub_vl (vti.Vector (SplatPat_simm5 simm5:$rs2)),		def : Pat<(riscv_sub_vl (vti.Vector (SplatPat_simm5 simm5:$rs2)),
(vti.Vector vti.RegClass:$rs1), (vti.Mask V0),		(vti.Vector vti.RegClass:$rs1), (vti.Mask V0),
VLOpFrag),		VLOpFrag),
(!cast<Instruction>("PseudoVRSUB_VI_"# vti.LMul.MX#"_MASK")		(!cast<Instruction>("PseudoVRSUB_VI_"# vti.LMul.MX#"_MASK")
(vti.Vector (IMPLICIT_DEF)), vti.RegClass:$rs1, simm5:$rs2,		(vti.Vector (IMPLICIT_DEF)), vti.RegClass:$rs1, simm5:$rs2,
(vti.Mask V0), GPR:$vl, vti.Log2SEW, TAIL_AGNOSTIC)>;		(vti.Mask V0), GPR:$vl, vti.Log2SEW, TAIL_AGNOSTIC)>;
}		}

		// 12.2. Vector Widening Integer Add/Subtract
		defm : VPatBinaryWVL_VV_VX<riscv_vwaddu_vl, "PseudoVWADDU">;

// 12.3. Vector Integer Extension		// 12.3. Vector Integer Extension
defm : VPatExtendSDNode_V_VL<riscv_zext_vl, "PseudoVZEXT", "VF2",		defm : VPatExtendSDNode_V_VL<riscv_zext_vl, "PseudoVZEXT", "VF2",
AllFractionableVF2IntVectors>;		AllFractionableVF2IntVectors>;
defm : VPatExtendSDNode_V_VL<riscv_sext_vl, "PseudoVSEXT", "VF2",		defm : VPatExtendSDNode_V_VL<riscv_sext_vl, "PseudoVSEXT", "VF2",
AllFractionableVF2IntVectors>;		AllFractionableVF2IntVectors>;
defm : VPatExtendSDNode_V_VL<riscv_zext_vl, "PseudoVZEXT", "VF4",		defm : VPatExtendSDNode_V_VL<riscv_zext_vl, "PseudoVZEXT", "VF4",
AllFractionableVF4IntVectors>;		AllFractionableVF4IntVectors>;
defm : VPatExtendSDNode_V_VL<riscv_sext_vl, "PseudoVSEXT", "VF4",		defm : VPatExtendSDNode_V_VL<riscv_sext_vl, "PseudoVSEXT", "VF4",
▲ Show 20 Lines • Show All 862 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -mattr=+experimental-v,+zfh -riscv-v-vector-bits-min=128 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V128,RV32-V128
				; RUN: llc -mtriple=riscv64 -mattr=+experimental-v,+zfh -riscv-v-vector-bits-min=128 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V128,RV64-V128
				; RUN: llc -mtriple=riscv32 -mattr=+experimental-v,+zfh -riscv-v-vector-bits-min=512 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V512,RV32-V512
				; RUN: llc -mtriple=riscv64 -mattr=+experimental-v,+zfh -riscv-v-vector-bits-min=512 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V512,RV64-V512

				; Test optimizing interleaves to widening arithmetic.

				define <4 x half> @interleave_v2f16(<2 x half> %x, <2 x half> %y) {
				; CHECK-LABEL: interleave_v2f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e16, mf4, ta, mu
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%a = shufflevector <2 x half> %x, <2 x half> %y, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x half> %a
				}

				; Vector order switched for coverage.
				define <4 x float> @interleave_v2f32(<2 x float> %x, <2 x float> %y) {
				; CHECK-LABEL: interleave_v2f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e32, mf2, ta, mu
				; CHECK-NEXT: vwaddu.vv v10, v9, v8
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v8
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%a = shufflevector <2 x float> %x, <2 x float> %y, <4 x i32> <i32 2, i32 0, i32 3, i32 1>
				ret <4 x float> %a
				}

				; One vXf64 test case to very that we don't optimize it.
				; FIXME: Is there better codegen we can do here?
				define <4 x double> @interleave_v2f64(<2 x double> %x, <2 x double> %y) {
				; RV32-V128-LABEL: interleave_v2f64:
				; RV32-V128: # %bb.0:
				; RV32-V128-NEXT: vmv1r.v v12, v9
				; RV32-V128-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV32-V128-NEXT: vsetivli zero, 4, e16, mf2, ta, mu
				; RV32-V128-NEXT: vid.v v10
				; RV32-V128-NEXT: vsrl.vi v14, v10, 1
				; RV32-V128-NEXT: vsetvli zero, zero, e64, m2, ta, mu
				; RV32-V128-NEXT: vrgatherei16.vv v10, v8, v14
				; RV32-V128-NEXT: li a0, 10
				; RV32-V128-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV32-V128-NEXT: vmv.s.x v0, a0
				; RV32-V128-NEXT: vsetivli zero, 4, e64, m2, ta, mu
				; RV32-V128-NEXT: vrgatherei16.vv v10, v12, v14, v0.t
				; RV32-V128-NEXT: vmv.v.v v8, v10
				; RV32-V128-NEXT: ret
				;
				; RV64-V128-LABEL: interleave_v2f64:
				; RV64-V128: # %bb.0:
				; RV64-V128-NEXT: vmv1r.v v12, v9
				; RV64-V128-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV64-V128-NEXT: vsetivli zero, 4, e64, m2, ta, mu
				; RV64-V128-NEXT: vid.v v10
				; RV64-V128-NEXT: vsrl.vi v14, v10, 1
				; RV64-V128-NEXT: vrgather.vv v10, v8, v14
				; RV64-V128-NEXT: li a0, 10
				; RV64-V128-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV64-V128-NEXT: vmv.s.x v0, a0
				; RV64-V128-NEXT: vsetivli zero, 4, e64, m2, ta, mu
				; RV64-V128-NEXT: vrgather.vv v10, v12, v14, v0.t
				; RV64-V128-NEXT: vmv.v.v v8, v10
				; RV64-V128-NEXT: ret
				;
				; RV32-V512-LABEL: interleave_v2f64:
				; RV32-V512: # %bb.0:
				; RV32-V512-NEXT: vsetivli zero, 4, e16, mf4, ta, mu
				; RV32-V512-NEXT: vid.v v10
				; RV32-V512-NEXT: vsrl.vi v11, v10, 1
				; RV32-V512-NEXT: vsetvli zero, zero, e64, m1, ta, mu
				; RV32-V512-NEXT: vrgatherei16.vv v10, v8, v11
				; RV32-V512-NEXT: li a0, 10
				; RV32-V512-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV32-V512-NEXT: vmv.s.x v0, a0
				; RV32-V512-NEXT: vsetivli zero, 4, e64, m1, ta, mu
				; RV32-V512-NEXT: vrgatherei16.vv v10, v9, v11, v0.t
				; RV32-V512-NEXT: vmv.v.v v8, v10
				; RV32-V512-NEXT: ret
				;
				; RV64-V512-LABEL: interleave_v2f64:
				; RV64-V512: # %bb.0:
				; RV64-V512-NEXT: vsetivli zero, 4, e64, m1, ta, mu
				; RV64-V512-NEXT: vid.v v10
				; RV64-V512-NEXT: vsrl.vi v11, v10, 1
				; RV64-V512-NEXT: vrgather.vv v10, v8, v11
				; RV64-V512-NEXT: li a0, 10
				; RV64-V512-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV64-V512-NEXT: vmv.s.x v0, a0
				; RV64-V512-NEXT: vsetivli zero, 4, e64, m1, ta, mu
				; RV64-V512-NEXT: vrgather.vv v10, v9, v11, v0.t
				; RV64-V512-NEXT: vmv.v.v v8, v10
				; RV64-V512-NEXT: ret
				%a = shufflevector <2 x double> %x, <2 x double> %y, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x double> %a
				}

				; Undef elements for coverage
				define <8 x half> @interleave_v4f16(<4 x half> %x, <4 x half> %y) {
				; V128-LABEL: interleave_v4f16:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 8, e16, mf2, ta, mu
				; V128-NEXT: vwaddu.vv v10, v8, v9
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v9
				; V128-NEXT: vmv1r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v4f16:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 8, e16, mf4, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <4 x half> %x, <4 x half> %y, <8 x i32> <i32 0, i32 4, i32 undef, i32 5, i32 2, i32 undef, i32 3, i32 7>
				ret <8 x half> %a
				}

				define <8 x float> @interleave_v4f32(<4 x float> %x, <4 x float> %y) {
				; V128-LABEL: interleave_v4f32:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 8, e32, m1, ta, mu
				; V128-NEXT: vwaddu.vv v10, v8, v9
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v9
				; V128-NEXT: vmv2r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v4f32:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 8, e32, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <4 x float> %x, <4 x float> %y, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
				ret <8 x float> %a
				}

				; Vector order switched for coverage.
				define <16 x half> @interleave_v8f16(<8 x half> %x, <8 x half> %y) {
				; V128-LABEL: interleave_v8f16:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 16, e16, m1, ta, mu
				; V128-NEXT: vwaddu.vv v10, v9, v8
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v8
				; V128-NEXT: vmv2r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v8f16:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 16, e16, mf4, ta, mu
				; V512-NEXT: vwaddu.vv v10, v9, v8
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v8
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <8 x half> %x, <8 x half> %y, <16 x i32> <i32 8, i32 0, i32 9, i32 1, i32 10, i32 2, i32 11, i32 3, i32 12, i32 4, i32 13, i32 5, i32 14, i32 6, i32 15, i32 7>
				ret <16 x half> %a
				}

				define <16 x float> @interleave_v8f32(<8 x float> %x, <8 x float> %y) {
				; V128-LABEL: interleave_v8f32:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 16, e32, m2, ta, mu
				; V128-NEXT: vwaddu.vv v12, v8, v10
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v12, a0, v10
				; V128-NEXT: vmv4r.v v8, v12
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v8f32:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 16, e32, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <8 x float> %x, <8 x float> %y, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				ret <16 x float> %a
				}

				define <32 x half> @interleave_v16f16(<16 x half> %x, <16 x half> %y) {
				; V128-LABEL: interleave_v16f16:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 32
				; V128-NEXT: vsetvli zero, a0, e16, m2, ta, mu
				; V128-NEXT: vwaddu.vv v12, v8, v10
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v12, a0, v10
				; V128-NEXT: vmv4r.v v8, v12
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v16f16:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 32
				; V512-NEXT: vsetvli zero, a0, e16, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <16 x half> %x, <16 x half> %y, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
				ret <32 x half> %a
				}

				define <32 x float> @interleave_v16f32(<16 x float> %x, <16 x float> %y) {
				; V128-LABEL: interleave_v16f32:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 32
				; V128-NEXT: vsetvli zero, a0, e32, m4, ta, mu
				; V128-NEXT: vwaddu.vv v16, v8, v12
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v16, a0, v12
				; V128-NEXT: vmv8r.v v8, v16
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v16f32:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 32
				; V512-NEXT: vsetvli zero, a0, e32, m1, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv2r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <16 x float> %x, <16 x float> %y, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
				ret <32 x float> %a
				}

				define <64 x half> @interleave_v32f16(<32 x half> %x, <32 x half> %y) {
				; V128-LABEL: interleave_v32f16:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 64
				; V128-NEXT: vsetvli zero, a0, e16, m4, ta, mu
				; V128-NEXT: vwaddu.vv v16, v8, v12
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v16, a0, v12
				; V128-NEXT: vmv8r.v v8, v16
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v32f16:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 64
				; V512-NEXT: vsetvli zero, a0, e16, m1, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv2r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <32 x half> %x, <32 x half> %y, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
				ret <64 x half> %a
				}

				define <64 x float> @interleave_v32f32(<32 x float> %x, <32 x float> %y) {
				; RV32-V128-LABEL: interleave_v32f32:
				; RV32-V128: # %bb.0:
				; RV32-V128-NEXT: addi sp, sp, -16
				; RV32-V128-NEXT: .cfi_def_cfa_offset 16
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 4
				; RV32-V128-NEXT: sub sp, sp, a0
				; RV32-V128-NEXT: lui a0, %hi(.LCPI10_0)
				; RV32-V128-NEXT: addi a0, a0, %lo(.LCPI10_0)
				; RV32-V128-NEXT: li a1, 32
				; RV32-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV32-V128-NEXT: vle32.v v0, (a0)
				; RV32-V128-NEXT: vmv8r.v v24, v8
				; RV32-V128-NEXT: addi a0, sp, 16
				; RV32-V128-NEXT: vs8r.v v8, (a0) # Unknown-size Folded Spill
				; RV32-V128-NEXT: vrgather.vv v8, v24, v0
				; RV32-V128-NEXT: lui a0, %hi(.LCPI10_1)
				; RV32-V128-NEXT: addi a0, a0, %lo(.LCPI10_1)
				; RV32-V128-NEXT: vle32.v v24, (a0)
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 3
				; RV32-V128-NEXT: add a0, sp, a0
				; RV32-V128-NEXT: addi a0, a0, 16
				; RV32-V128-NEXT: vs8r.v v24, (a0) # Unknown-size Folded Spill
				; RV32-V128-NEXT: lui a0, 699051
				; RV32-V128-NEXT: addi a0, a0, -1366
				; RV32-V128-NEXT: vsetivli zero, 1, e32, mf2, ta, mu
				; RV32-V128-NEXT: vmv.s.x v0, a0
				; RV32-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 3
				; RV32-V128-NEXT: add a0, sp, a0
				; RV32-V128-NEXT: addi a0, a0, 16
				; RV32-V128-NEXT: vl8re8.v v24, (a0) # Unknown-size Folded Reload
				; RV32-V128-NEXT: vrgather.vv v8, v16, v24, v0.t
				; RV32-V128-NEXT: vmv.v.v v24, v8
				; RV32-V128-NEXT: vsetvli zero, a1, e32, m4, ta, mu
				; RV32-V128-NEXT: addi a0, sp, 16
				; RV32-V128-NEXT: vl8re8.v v8, (a0) # Unknown-size Folded Reload
				; RV32-V128-NEXT: vwaddu.vv v0, v8, v16
				; RV32-V128-NEXT: li a0, -1
				; RV32-V128-NEXT: vwmaccu.vx v0, a0, v16
				; RV32-V128-NEXT: vmv8r.v v8, v0
				; RV32-V128-NEXT: vmv8r.v v16, v24
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 4
				; RV32-V128-NEXT: add sp, sp, a0
				; RV32-V128-NEXT: addi sp, sp, 16
				; RV32-V128-NEXT: ret
				;
				; RV64-V128-LABEL: interleave_v32f32:
				; RV64-V128: # %bb.0:
				; RV64-V128-NEXT: addi sp, sp, -16
				; RV64-V128-NEXT: .cfi_def_cfa_offset 16
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 4
				; RV64-V128-NEXT: sub sp, sp, a0
				; RV64-V128-NEXT: lui a0, %hi(.LCPI10_0)
				; RV64-V128-NEXT: addi a0, a0, %lo(.LCPI10_0)
				; RV64-V128-NEXT: li a1, 32
				; RV64-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV64-V128-NEXT: vle32.v v0, (a0)
				; RV64-V128-NEXT: vmv8r.v v24, v8
				; RV64-V128-NEXT: addi a0, sp, 16
				; RV64-V128-NEXT: vs8r.v v8, (a0) # Unknown-size Folded Spill
				; RV64-V128-NEXT: vrgather.vv v8, v24, v0
				; RV64-V128-NEXT: lui a0, %hi(.LCPI10_1)
				; RV64-V128-NEXT: addi a0, a0, %lo(.LCPI10_1)
				; RV64-V128-NEXT: vle32.v v24, (a0)
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 3
				; RV64-V128-NEXT: add a0, sp, a0
				; RV64-V128-NEXT: addi a0, a0, 16
				; RV64-V128-NEXT: vs8r.v v24, (a0) # Unknown-size Folded Spill
				; RV64-V128-NEXT: lui a0, 699051
				; RV64-V128-NEXT: addiw a0, a0, -1366
				; RV64-V128-NEXT: vsetivli zero, 1, e32, mf2, ta, mu
				; RV64-V128-NEXT: vmv.s.x v0, a0
				; RV64-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 3
				; RV64-V128-NEXT: add a0, sp, a0
				; RV64-V128-NEXT: addi a0, a0, 16
				; RV64-V128-NEXT: vl8re8.v v24, (a0) # Unknown-size Folded Reload
				; RV64-V128-NEXT: vrgather.vv v8, v16, v24, v0.t
				; RV64-V128-NEXT: vmv.v.v v24, v8
				; RV64-V128-NEXT: vsetvli zero, a1, e32, m4, ta, mu
				; RV64-V128-NEXT: addi a0, sp, 16
				; RV64-V128-NEXT: vl8re8.v v8, (a0) # Unknown-size Folded Reload
				; RV64-V128-NEXT: vwaddu.vv v0, v8, v16
				; RV64-V128-NEXT: li a0, -1
				; RV64-V128-NEXT: vwmaccu.vx v0, a0, v16
				; RV64-V128-NEXT: vmv8r.v v8, v0
				; RV64-V128-NEXT: vmv8r.v v16, v24
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 4
				; RV64-V128-NEXT: add sp, sp, a0
				; RV64-V128-NEXT: addi sp, sp, 16
				; RV64-V128-NEXT: ret
				;
				; V512-LABEL: interleave_v32f32:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 64
				; V512-NEXT: vsetvli zero, a0, e32, m2, ta, mu
				; V512-NEXT: vwaddu.vv v12, v8, v10
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v12, a0, v10
				; V512-NEXT: vmv4r.v v8, v12
				; V512-NEXT: ret
				%a = shufflevector <32 x float> %x, <32 x float> %y, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
				ret <64 x float> %a
				}

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -mattr=+experimental-v -riscv-v-vector-bits-min=128 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V128,RV32-V128
				; RUN: llc -mtriple=riscv64 -mattr=+experimental-v -riscv-v-vector-bits-min=128 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V128,RV64-V128
				; RUN: llc -mtriple=riscv32 -mattr=+experimental-v -riscv-v-vector-bits-min=512 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V512,RV32-V512
				; RUN: llc -mtriple=riscv64 -mattr=+experimental-v -riscv-v-vector-bits-min=512 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,V512,RV64-V512

				; Test optimizing interleaves to widening arithmetic.

				define <4 x i8> @interleave_v2i8(<2 x i8> %x, <2 x i8> %y) {
				; CHECK-LABEL: interleave_v2i8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e8, mf8, ta, mu
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%a = shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x i8> %a
				}

				define <4 x i16> @interleave_v2i16(<2 x i16> %x, <2 x i16> %y) {
				; CHECK-LABEL: interleave_v2i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e16, mf4, ta, mu
				; CHECK-NEXT: vwaddu.vv v10, v8, v9
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v9
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%a = shufflevector <2 x i16> %x, <2 x i16> %y, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x i16> %a
				}

				; Vector order switched for coverage.
				define <4 x i32> @interleave_v2i32(<2 x i32> %x, <2 x i32> %y) {
				; CHECK-LABEL: interleave_v2i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 4, e32, mf2, ta, mu
				; CHECK-NEXT: vwaddu.vv v10, v9, v8
				; CHECK-NEXT: li a0, -1
				; CHECK-NEXT: vwmaccu.vx v10, a0, v8
				; CHECK-NEXT: vmv1r.v v8, v10
				; CHECK-NEXT: ret
				%a = shufflevector <2 x i32> %x, <2 x i32> %y, <4 x i32> <i32 2, i32 0, i32 3, i32 1>
				ret <4 x i32> %a
				}

				; One vXi64 test case to very that we don't optimize it.
				; FIXME: Is there better codegen we can do here?
				define <4 x i64> @interleave_v2i64(<2 x i64> %x, <2 x i64> %y) {
				; RV32-V128-LABEL: interleave_v2i64:
				; RV32-V128: # %bb.0:
				; RV32-V128-NEXT: vmv1r.v v12, v9
				; RV32-V128-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV32-V128-NEXT: vsetivli zero, 4, e16, mf2, ta, mu
				; RV32-V128-NEXT: vid.v v10
				; RV32-V128-NEXT: vsrl.vi v14, v10, 1
				; RV32-V128-NEXT: vsetvli zero, zero, e64, m2, ta, mu
				; RV32-V128-NEXT: vrgatherei16.vv v10, v8, v14
				; RV32-V128-NEXT: li a0, 10
				; RV32-V128-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV32-V128-NEXT: vmv.s.x v0, a0
				; RV32-V128-NEXT: vsetivli zero, 4, e64, m2, ta, mu
				; RV32-V128-NEXT: vrgatherei16.vv v10, v12, v14, v0.t
				; RV32-V128-NEXT: vmv.v.v v8, v10
				; RV32-V128-NEXT: ret
				;
				; RV64-V128-LABEL: interleave_v2i64:
				; RV64-V128: # %bb.0:
				; RV64-V128-NEXT: vmv1r.v v12, v9
				; RV64-V128-NEXT: # kill: def $v8 killed $v8 def $v8m2
				; RV64-V128-NEXT: vsetivli zero, 4, e64, m2, ta, mu
				; RV64-V128-NEXT: vid.v v10
				; RV64-V128-NEXT: vsrl.vi v14, v10, 1
				; RV64-V128-NEXT: vrgather.vv v10, v8, v14
				; RV64-V128-NEXT: li a0, 10
				; RV64-V128-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV64-V128-NEXT: vmv.s.x v0, a0
				; RV64-V128-NEXT: vsetivli zero, 4, e64, m2, ta, mu
				; RV64-V128-NEXT: vrgather.vv v10, v12, v14, v0.t
				; RV64-V128-NEXT: vmv.v.v v8, v10
				; RV64-V128-NEXT: ret
				;
				; RV32-V512-LABEL: interleave_v2i64:
				; RV32-V512: # %bb.0:
				; RV32-V512-NEXT: vsetivli zero, 4, e16, mf4, ta, mu
				; RV32-V512-NEXT: vid.v v10
				; RV32-V512-NEXT: vsrl.vi v11, v10, 1
				; RV32-V512-NEXT: vsetvli zero, zero, e64, m1, ta, mu
				; RV32-V512-NEXT: vrgatherei16.vv v10, v8, v11
				; RV32-V512-NEXT: li a0, 10
				; RV32-V512-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV32-V512-NEXT: vmv.s.x v0, a0
				; RV32-V512-NEXT: vsetivli zero, 4, e64, m1, ta, mu
				; RV32-V512-NEXT: vrgatherei16.vv v10, v9, v11, v0.t
				; RV32-V512-NEXT: vmv.v.v v8, v10
				; RV32-V512-NEXT: ret
				;
				; RV64-V512-LABEL: interleave_v2i64:
				; RV64-V512: # %bb.0:
				; RV64-V512-NEXT: vsetivli zero, 4, e64, m1, ta, mu
				; RV64-V512-NEXT: vid.v v10
				; RV64-V512-NEXT: vsrl.vi v11, v10, 1
				; RV64-V512-NEXT: vrgather.vv v10, v8, v11
				; RV64-V512-NEXT: li a0, 10
				; RV64-V512-NEXT: vsetivli zero, 1, e8, mf8, ta, mu
				; RV64-V512-NEXT: vmv.s.x v0, a0
				; RV64-V512-NEXT: vsetivli zero, 4, e64, m1, ta, mu
				; RV64-V512-NEXT: vrgather.vv v10, v9, v11, v0.t
				; RV64-V512-NEXT: vmv.v.v v8, v10
				; RV64-V512-NEXT: ret
				%a = shufflevector <2 x i64> %x, <2 x i64> %y, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				ret <4 x i64> %a
				}

				; Vector order switched for coverage.
				define <8 x i8> @interleave_v4i8(<4 x i8> %x, <4 x i8> %y) {
				; V128-LABEL: interleave_v4i8:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 8, e8, mf4, ta, mu
				; V128-NEXT: vwaddu.vv v10, v9, v8
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v8
				; V128-NEXT: vmv1r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v4i8:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 8, e8, mf8, ta, mu
				; V512-NEXT: vwaddu.vv v10, v9, v8
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v8
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <4 x i8> %x, <4 x i8> %y, <8 x i32> <i32 4, i32 0, i32 5, i32 1, i32 6, i32 2, i32 7, i32 3>
				ret <8 x i8> %a
				}

				; Undef elements for coverage
				define <8 x i16> @interleave_v4i16(<4 x i16> %x, <4 x i16> %y) {
				; V128-LABEL: interleave_v4i16:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 8, e16, mf2, ta, mu
				; V128-NEXT: vwaddu.vv v10, v8, v9
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v9
				; V128-NEXT: vmv1r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v4i16:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 8, e16, mf4, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <4 x i16> %x, <4 x i16> %y, <8 x i32> <i32 0, i32 4, i32 undef, i32 5, i32 2, i32 undef, i32 3, i32 7>
				ret <8 x i16> %a
				}

				define <8 x i32> @interleave_v4i32(<4 x i32> %x, <4 x i32> %y) {
				; V128-LABEL: interleave_v4i32:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 8, e32, m1, ta, mu
				; V128-NEXT: vwaddu.vv v10, v8, v9
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v9
				; V128-NEXT: vmv2r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v4i32:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 8, e32, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <4 x i32> %x, <4 x i32> %y, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
				ret <8 x i32> %a
				}

				define <16 x i8> @interleave_v8i8(<8 x i8> %x, <8 x i8> %y) {
				; V128-LABEL: interleave_v8i8:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 16, e8, mf2, ta, mu
				; V128-NEXT: vwaddu.vv v10, v8, v9
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v9
				; V128-NEXT: vmv1r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v8i8:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 16, e8, mf8, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <8 x i8> %x, <8 x i8> %y, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				ret <16 x i8> %a
				}

				; Vector order switched for coverage.
				define <16 x i16> @interleave_v8i16(<8 x i16> %x, <8 x i16> %y) {
				; V128-LABEL: interleave_v8i16:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 16, e16, m1, ta, mu
				; V128-NEXT: vwaddu.vv v10, v9, v8
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v8
				; V128-NEXT: vmv2r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v8i16:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 16, e16, mf4, ta, mu
				; V512-NEXT: vwaddu.vv v10, v9, v8
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v8
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <8 x i16> %x, <8 x i16> %y, <16 x i32> <i32 8, i32 0, i32 9, i32 1, i32 10, i32 2, i32 11, i32 3, i32 12, i32 4, i32 13, i32 5, i32 14, i32 6, i32 15, i32 7>
				ret <16 x i16> %a
				}

				define <16 x i32> @interleave_v8i32(<8 x i32> %x, <8 x i32> %y) {
				; V128-LABEL: interleave_v8i32:
				; V128: # %bb.0:
				; V128-NEXT: vsetivli zero, 16, e32, m2, ta, mu
				; V128-NEXT: vwaddu.vv v12, v8, v10
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v12, a0, v10
				; V128-NEXT: vmv4r.v v8, v12
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v8i32:
				; V512: # %bb.0:
				; V512-NEXT: vsetivli zero, 16, e32, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <8 x i32> %x, <8 x i32> %y, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				ret <16 x i32> %a
				}

				define <32 x i8> @interleave_v16i8(<16 x i8> %x, <16 x i8> %y) {
				; V128-LABEL: interleave_v16i8:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 32
				; V128-NEXT: vsetvli zero, a0, e8, m1, ta, mu
				; V128-NEXT: vwaddu.vv v10, v8, v9
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v10, a0, v9
				; V128-NEXT: vmv2r.v v8, v10
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v16i8:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 32
				; V512-NEXT: vsetvli zero, a0, e8, mf4, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <16 x i8> %x, <16 x i8> %y, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
				ret <32 x i8> %a
				}

				define <32 x i16> @interleave_v16i16(<16 x i16> %x, <16 x i16> %y) {
				; V128-LABEL: interleave_v16i16:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 32
				; V128-NEXT: vsetvli zero, a0, e16, m2, ta, mu
				; V128-NEXT: vwaddu.vv v12, v8, v10
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v12, a0, v10
				; V128-NEXT: vmv4r.v v8, v12
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v16i16:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 32
				; V512-NEXT: vsetvli zero, a0, e16, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <16 x i16> %x, <16 x i16> %y, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
				ret <32 x i16> %a
				}

				define <32 x i32> @interleave_v16i32(<16 x i32> %x, <16 x i32> %y) {
				; V128-LABEL: interleave_v16i32:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 32
				; V128-NEXT: vsetvli zero, a0, e32, m4, ta, mu
				; V128-NEXT: vwaddu.vv v16, v8, v12
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v16, a0, v12
				; V128-NEXT: vmv8r.v v8, v16
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v16i32:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 32
				; V512-NEXT: vsetvli zero, a0, e32, m1, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv2r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <16 x i32> %x, <16 x i32> %y, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
				ret <32 x i32> %a
				}

				define <64 x i8> @interleave_v32i8(<32 x i8> %x, <32 x i8> %y) {
				; V128-LABEL: interleave_v32i8:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 64
				; V128-NEXT: vsetvli zero, a0, e8, m2, ta, mu
				; V128-NEXT: vwaddu.vv v12, v8, v10
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v12, a0, v10
				; V128-NEXT: vmv4r.v v8, v12
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v32i8:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 64
				; V512-NEXT: vsetvli zero, a0, e8, mf2, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv1r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <32 x i8> %x, <32 x i8> %y, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
				ret <64 x i8> %a
				}

				define <64 x i16> @interleave_v32i16(<32 x i16> %x, <32 x i16> %y) {
				; V128-LABEL: interleave_v32i16:
				; V128: # %bb.0:
				; V128-NEXT: li a0, 64
				; V128-NEXT: vsetvli zero, a0, e16, m4, ta, mu
				; V128-NEXT: vwaddu.vv v16, v8, v12
				; V128-NEXT: li a0, -1
				; V128-NEXT: vwmaccu.vx v16, a0, v12
				; V128-NEXT: vmv8r.v v8, v16
				; V128-NEXT: ret
				;
				; V512-LABEL: interleave_v32i16:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 64
				; V512-NEXT: vsetvli zero, a0, e16, m1, ta, mu
				; V512-NEXT: vwaddu.vv v10, v8, v9
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v10, a0, v9
				; V512-NEXT: vmv2r.v v8, v10
				; V512-NEXT: ret
				%a = shufflevector <32 x i16> %x, <32 x i16> %y, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
				ret <64 x i16> %a
				}

				define <64 x i32> @interleave_v32i32(<32 x i32> %x, <32 x i32> %y) {
				; RV32-V128-LABEL: interleave_v32i32:
				; RV32-V128: # %bb.0:
				; RV32-V128-NEXT: addi sp, sp, -16
				; RV32-V128-NEXT: .cfi_def_cfa_offset 16
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 4
				; RV32-V128-NEXT: sub sp, sp, a0
				; RV32-V128-NEXT: lui a0, %hi(.LCPI15_0)
				; RV32-V128-NEXT: addi a0, a0, %lo(.LCPI15_0)
				; RV32-V128-NEXT: li a1, 32
				; RV32-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV32-V128-NEXT: vle32.v v0, (a0)
				; RV32-V128-NEXT: vmv8r.v v24, v8
				; RV32-V128-NEXT: addi a0, sp, 16
				; RV32-V128-NEXT: vs8r.v v8, (a0) # Unknown-size Folded Spill
				; RV32-V128-NEXT: vrgather.vv v8, v24, v0
				; RV32-V128-NEXT: lui a0, %hi(.LCPI15_1)
				; RV32-V128-NEXT: addi a0, a0, %lo(.LCPI15_1)
				; RV32-V128-NEXT: vle32.v v24, (a0)
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 3
				; RV32-V128-NEXT: add a0, sp, a0
				; RV32-V128-NEXT: addi a0, a0, 16
				; RV32-V128-NEXT: vs8r.v v24, (a0) # Unknown-size Folded Spill
				; RV32-V128-NEXT: lui a0, 699051
				; RV32-V128-NEXT: addi a0, a0, -1366
				; RV32-V128-NEXT: vsetivli zero, 1, e32, mf2, ta, mu
				; RV32-V128-NEXT: vmv.s.x v0, a0
				; RV32-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 3
				; RV32-V128-NEXT: add a0, sp, a0
				; RV32-V128-NEXT: addi a0, a0, 16
				; RV32-V128-NEXT: vl8re8.v v24, (a0) # Unknown-size Folded Reload
				; RV32-V128-NEXT: vrgather.vv v8, v16, v24, v0.t
				; RV32-V128-NEXT: vmv.v.v v24, v8
				; RV32-V128-NEXT: vsetvli zero, a1, e32, m4, ta, mu
				; RV32-V128-NEXT: addi a0, sp, 16
				; RV32-V128-NEXT: vl8re8.v v8, (a0) # Unknown-size Folded Reload
				; RV32-V128-NEXT: vwaddu.vv v0, v8, v16
				; RV32-V128-NEXT: li a0, -1
				; RV32-V128-NEXT: vwmaccu.vx v0, a0, v16
				; RV32-V128-NEXT: vmv8r.v v8, v0
				; RV32-V128-NEXT: vmv8r.v v16, v24
				; RV32-V128-NEXT: csrr a0, vlenb
				; RV32-V128-NEXT: slli a0, a0, 4
				; RV32-V128-NEXT: add sp, sp, a0
				; RV32-V128-NEXT: addi sp, sp, 16
				; RV32-V128-NEXT: ret
				;
				; RV64-V128-LABEL: interleave_v32i32:
				; RV64-V128: # %bb.0:
				; RV64-V128-NEXT: addi sp, sp, -16
				; RV64-V128-NEXT: .cfi_def_cfa_offset 16
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 4
				; RV64-V128-NEXT: sub sp, sp, a0
				; RV64-V128-NEXT: lui a0, %hi(.LCPI15_0)
				; RV64-V128-NEXT: addi a0, a0, %lo(.LCPI15_0)
				; RV64-V128-NEXT: li a1, 32
				; RV64-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV64-V128-NEXT: vle32.v v0, (a0)
				; RV64-V128-NEXT: vmv8r.v v24, v8
				; RV64-V128-NEXT: addi a0, sp, 16
				; RV64-V128-NEXT: vs8r.v v8, (a0) # Unknown-size Folded Spill
				; RV64-V128-NEXT: vrgather.vv v8, v24, v0
				; RV64-V128-NEXT: lui a0, %hi(.LCPI15_1)
				; RV64-V128-NEXT: addi a0, a0, %lo(.LCPI15_1)
				; RV64-V128-NEXT: vle32.v v24, (a0)
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 3
				; RV64-V128-NEXT: add a0, sp, a0
				; RV64-V128-NEXT: addi a0, a0, 16
				; RV64-V128-NEXT: vs8r.v v24, (a0) # Unknown-size Folded Spill
				; RV64-V128-NEXT: lui a0, 699051
				; RV64-V128-NEXT: addiw a0, a0, -1366
				; RV64-V128-NEXT: vsetivli zero, 1, e32, mf2, ta, mu
				; RV64-V128-NEXT: vmv.s.x v0, a0
				; RV64-V128-NEXT: vsetvli zero, a1, e32, m8, ta, mu
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 3
				; RV64-V128-NEXT: add a0, sp, a0
				; RV64-V128-NEXT: addi a0, a0, 16
				; RV64-V128-NEXT: vl8re8.v v24, (a0) # Unknown-size Folded Reload
				; RV64-V128-NEXT: vrgather.vv v8, v16, v24, v0.t
				; RV64-V128-NEXT: vmv.v.v v24, v8
				; RV64-V128-NEXT: vsetvli zero, a1, e32, m4, ta, mu
				; RV64-V128-NEXT: addi a0, sp, 16
				; RV64-V128-NEXT: vl8re8.v v8, (a0) # Unknown-size Folded Reload
				; RV64-V128-NEXT: vwaddu.vv v0, v8, v16
				; RV64-V128-NEXT: li a0, -1
				; RV64-V128-NEXT: vwmaccu.vx v0, a0, v16
				; RV64-V128-NEXT: vmv8r.v v8, v0
				; RV64-V128-NEXT: vmv8r.v v16, v24
				; RV64-V128-NEXT: csrr a0, vlenb
				; RV64-V128-NEXT: slli a0, a0, 4
				; RV64-V128-NEXT: add sp, sp, a0
				; RV64-V128-NEXT: addi sp, sp, 16
				; RV64-V128-NEXT: ret
				;
				; V512-LABEL: interleave_v32i32:
				; V512: # %bb.0:
				; V512-NEXT: li a0, 64
				; V512-NEXT: vsetvli zero, a0, e32, m2, ta, mu
				; V512-NEXT: vwaddu.vv v12, v8, v10
				; V512-NEXT: li a0, -1
				; V512-NEXT: vwmaccu.vx v12, a0, v10
				; V512-NEXT: vmv4r.v v8, v12
				; V512-NEXT: ret
				%a = shufflevector <32 x i32> %x, <32 x i32> %y, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
				ret <64 x i32> %a
				}

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll

	Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%s = shufflevector <8 x i64> %x, <8 x i64> <i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5>, <8 x i32> <i32 0, i32 3, i32 10, i32 9, i32 4, i32 1, i32 7, i32 14>			%s = shufflevector <8 x i64> %x, <8 x i64> <i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5>, <8 x i32> <i32 0, i32 3, i32 10, i32 9, i32 4, i32 1, i32 7, i32 14>
	ret <8 x i64> %s			ret <8 x i64> %s
	}			}

	define <4 x i8> @interleave_shuffles(<4 x i8> %x) {			define <4 x i8> @interleave_shuffles(<4 x i8> %x) {
	; CHECK-LABEL: interleave_shuffles:			; CHECK-LABEL: interleave_shuffles:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 0, e8, mf4, ta, mu
	; CHECK-NEXT: vmv.x.s a0, v8
	; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, mu			; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, mu
	; CHECK-NEXT: vrgather.vi v9, v8, 1			; CHECK-NEXT: vrgather.vi v9, v8, 0
	; CHECK-NEXT: li a1, 10			; CHECK-NEXT: vrgather.vi v10, v8, 1
	; CHECK-NEXT: vmv.s.x v0, a1			; CHECK-NEXT: vsetivli zero, 4, e8, mf8, ta, mu
	; CHECK-NEXT: vid.v v8			; CHECK-NEXT: vwaddu.vv v8, v9, v10
	; CHECK-NEXT: vsrl.vi v10, v8, 1			; CHECK-NEXT: li a0, -1
	; CHECK-NEXT: vmv.v.x v8, a0			; CHECK-NEXT: vwmaccu.vx v8, a0, v10
	; CHECK-NEXT: vrgather.vv v8, v9, v10, v0.t
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 0, i32 0, i32 0, i32 0>			%y = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
	%z = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%z = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	%w = shufflevector <4 x i8> %y, <4 x i8> %z, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%w = shufflevector <4 x i8> %y, <4 x i8> %z, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	ret <4 x i8> %w			ret <4 x i8> %w
	}			}

	define <8 x i8> @splat_ve4(<8 x i8> %v) {			define <8 x i8> @splat_ve4(<8 x i8> %v) {
	▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines