This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
3/9
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
fixed-vectors-shufflevector-vnsrl.ll

Differential D137704

[RISCV] Make lowerVECTOR_SHUFFLEAsVNSRL support more vnsrl shuffle pattern.
Needs ReviewPublic

Authored by HanKuanChen on Nov 9 2022, 2:15 AM.

Download Raw Diff

Details

Reviewers

craig.topper
reames
frasercrmck

Summary

Current lowerVECTOR_SHUFFLEAsVNSRL cannot support trailing undef and
other vnsrl pattern (e.g., an arithmetic sequence with 4 difference).
This commit will support power of 2 difference and use an iterative way
to split vector_shuffle into a series of vnsrl.

Some pattern (e.g., vnsrl_4_undef_i8) does not use vector_shuffle
because general DAG combiner cannot make build_vector into
vector_shuffle.

Also, isVnsrlShuffle is also provided. I expect isShuffleMaskLegal can
call isVnsrlShuffle in the future.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

HanKuanChen created this revision.Nov 9 2022, 2:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 2:15 AM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 28 others. · View Herald Transcript

HanKuanChen requested review of this revision.Nov 9 2022, 2:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 2:16 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

HanKuanChen added a parent revision: D137703: [RISCV] Pre-commit test..Nov 9 2022, 2:16 AM

Harbormaster completed remote builds in B196862: Diff 474204.Nov 9 2022, 2:16 AM

Fix pre-merge checks.

Harbormaster completed remote builds in B196863: Diff 474207.Nov 9 2022, 3:56 AM

craig.topper added inline comments.Nov 10 2022, 10:22 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2761	What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks block that. I guess maybe it's handled by the std::any_of later? I'd feel better if we checked Mask[0] and Mask[1] are >= 0 before the subtract.
2774	You can use auto here.
2775	Mask.end() instead of std::end
2778	Mask.begin instead of std::begin

HanKuanChen added inline comments.Nov 10 2022, 10:26 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2761	What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks block that. I guess maybe it's handled by the std::any_of later? I'd feel better if we checked Mask[0] and Mask[1] are >= 0 before the subtract. How about we move `find(Mask, -1)` and `std::any_of` to front?

Apply Craig's comment.

Harbormaster completed remote builds in B197405: Diff 474982.Nov 12 2022, 11:25 PM

HanKuanChen added a child revision: D137904: [RISCV] Provide a isOneSourceVECTOR_SHUFFLE function. NFC..Nov 13 2022, 12:39 AM

craig.topper added inline comments.Nov 13 2022, 4:37 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2924	Why did you change the order of bitcast and conversion?

HanKuanChen added inline comments.Nov 13 2022, 8:22 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2924	Why did you change the order of bitcast and conversion? Example: `Src` is `v4i32`. If we do bitcast then conversion, we get `v2i64` and `nxv1i64`. If we do conversion then bitcast, we only get `nxv1i32` because `nxv1i32` and `nxv1i64` do not have same size.

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

craig.topper added inline comments.Nov 13 2022, 9:08 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2873	Was this check intentionally removed from the new code?

HanKuanChen added inline comments.Nov 13 2022, 9:09 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2873	Was this check intentionally removed from the new code? Yes.

In D137704#3924007, @craig.topper wrote:

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

In D137704#3924035, @HanKuanChen wrote:

In D137704#3924007, @craig.topper wrote:

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

In D137704#3924061, @craig.topper wrote:
In D137704#3924035, @HanKuanChen wrote:
In D137704#3924007, @craig.topper wrote:
This test crashes
define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}
I will merge and close https://reviews.llvm.org/D137904 to solve this test.
How does D137904 solve this? That patch is marked NFC

t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32

I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

In D137704#3924065, @HanKuanChen wrote:
In D137704#3924061, @craig.topper wrote:
In D137704#3924035, @HanKuanChen wrote:
In D137704#3924007, @craig.topper wrote:
This test crashes
define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}
I will merge and close https://reviews.llvm.org/D137904 to solve this test.
How does D137904 solve this? That patch is marked NFC
t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32
I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.

Am I correct in thinking that all of the tests are using LMUL=1 or fractional LMUL?

In D137704#3924089, @craig.topper wrote:

Am I correct in thinking that all of the tests are using LMUL=1 or fractional LMUL?

Yes.

In D137704#3924088, @craig.topper wrote:
In D137704#3924065, @HanKuanChen wrote:
In D137704#3924061, @craig.topper wrote:
In D137704#3924035, @HanKuanChen wrote:
In D137704#3924007, @craig.topper wrote:
This test crashes
define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}
I will merge and close https://reviews.llvm.org/D137904 to solve this test.
How does D137904 solve this? That patch is marked NFC
t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32
I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.
I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.

May you provide the test?
I try <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> and <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>, but I don't get the crash if getSingleShuffleSource is used (and check the source and destination type).

In D137704#3924102, @HanKuanChen wrote:
In D137704#3924088, @craig.topper wrote:
In D137704#3924065, @HanKuanChen wrote:
In D137704#3924061, @craig.topper wrote:
In D137704#3924035, @HanKuanChen wrote:
In D137704#3924007, @craig.topper wrote:
This test crashes
define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}
I will merge and close https://reviews.llvm.org/D137904 to solve this test.
How does D137904 solve this? That patch is marked NFC
t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32
I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.
I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.
May you provide the test?
I try <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> and <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>, but I don't get the crash if getSingleShuffleSource is used (and check the source and destination type).

I was using Zvl128b not Zvl256b. Does that make a difference?

In D137704#3924114, @craig.topper wrote:
In D137704#3924102, @HanKuanChen wrote:
In D137704#3924088, @craig.topper wrote:
In D137704#3924065, @HanKuanChen wrote:
In D137704#3924061, @craig.topper wrote:
In D137704#3924035, @HanKuanChen wrote:
In D137704#3924007, @craig.topper wrote:
This test crashes
define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}
I will merge and close https://reviews.llvm.org/D137904 to solve this test.
How does D137904 solve this? That patch is marked NFC
t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32
I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.
I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.
May you provide the test?
I try <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> and <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>, but I don't get the crash if getSingleShuffleSource is used (and check the source and destination type).
I was using Zvl128b not Zvl256b. Does that make a difference?

No. But let me submit the new version so that we can test it.

Fix vector_shuffle source may be extract_subvector and undef.

Also provide getSingleShuffleSource (D137904).

Harbormaster completed remote builds in B197474: Diff 475065.Nov 14 2022, 1:21 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

179 lines

test/

CodeGen/

RISCV/

rvv/

fixed-vectors-shufflevector-vnsrl.ll

159 lines

Diff 474207

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,723 Lines • ▼ Show 20 Lines	static SDValue lowerScalarSplat(SDValue Passthru, SDValue Scalar, SDValue VL,
if (isOneConstant(VL) && isNullConstant(Scalar))		if (isOneConstant(VL) && isNullConstant(Scalar))
return DAG.getNode(RISCVISD::VMV_S_X_VL, DL, VT, Passthru,		return DAG.getNode(RISCVISD::VMV_S_X_VL, DL, VT, Passthru,
DAG.getConstant(0, DL, XLenVT), VL);		DAG.getConstant(0, DL, XLenVT), VL);

// Otherwise use the more complicated splatting algorithm.		// Otherwise use the more complicated splatting algorithm.
return splatSplitI64WithVL(DL, VT, Passthru, Scalar, VL, DAG);		return splatSplitI64WithVL(DL, VT, Passthru, Scalar, VL, DAG);
}		}

		// Mask can only be this form.
		// X X X X ... -1 -1 -1 ..., while X is not -1.
		// X X X X ... must be in ascending order.
		// for example,
		// [0, 2, 4, 6] -> vnsrl src, 0
		// [0, 4, 8, 12] -> vnsrl (vnsrl src, 0), 0
		// [3, 7, 11, 15] -> vnsrl (vnsrl src, EltSize * 2), EltSize
		// [2, 10, 18, 26] -> vnsrl (vnsrl (vnsrl src, 0), EltSize * 2), 0
		// In addition, N0 and N1 must from a same vector (or N1 is undef).
		static bool isVnsrlShuffle(SDValue N0, SDValue N1, ArrayRef<int> Mask, EVT VT,
		const RISCVSubtarget &Subtarget) {
		if (!N1.isUndef()) {
		// Both input must be extracts.
		if (N0.getOpcode() != ISD::EXTRACT_SUBVECTOR \|\|
		N1.getOpcode() != ISD::EXTRACT_SUBVECTOR)
		return false;

		// Extracting from the same source.
		SDValue Src = N0.getOperand(0);
		if (Src != N1.getOperand(0))
		return false;

		// Make sure N0 and N1 are continuous.
		if (N0.getConstantOperandVal(1) != 0 \|\|
		N1.getConstantOperandVal(1) != Mask.size())
		return false;
		}
		if (Mask.size() < 2)
		return false;
		int Difference = Mask[1] - Mask[0];
		craig.topperUnsubmitted Not Done Reply Inline Actions What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks block that. I guess maybe it's handled by the std::any_of later? I'd feel better if we checked Mask[0] and Mask[1] are >= 0 before the subtract. craig.topper: What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks…
		HanKuanChenAuthorUnsubmitted Done Reply Inline Actions What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks block that. I guess maybe it's handled by the std::any_of later? I'd feel better if we checked Mask[0] and Mask[1] are >= 0 before the subtract. How about we move `find(Mask, -1)` and `std::any_of` to front? HanKuanChen: > What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks…
		if (Difference <= Mask[0])
		return false;
		if (Difference == 1 \|\| !isPowerOf2_32(Difference))
		return false;
		unsigned EltSize = VT.getScalarSizeInBits();
		// The smallest type for vnsrl is i8.
		if (EltSize < 8)
		return false;
		// Because vnsrl will be used, we need to make sure it will not exceed ELEN.
		if (Subtarget.getELEN() < Difference * EltSize)
		return false;
		// Find first -1 and check whether the mask behind is -1.
		ArrayRef<int>::iterator FirstUndef = find(Mask, -1);
		craig.topperUnsubmitted Not Done Reply Inline Actions You can use auto here. craig.topper: You can use auto here.
		if (std::any_of(FirstUndef, std::end(Mask),
		craig.topperUnsubmitted Not Done Reply Inline Actions Mask.end() instead of std::end craig.topper: Mask.end() instead of std::end
		[](int MaskIdx) { return MaskIdx != -1; }))
		return false;
		ptrdiff_t ValidMaskEnd = std::distance(std::begin(Mask), FirstUndef);
		craig.topperUnsubmitted Not Done Reply Inline Actions Mask.begin instead of std::begin craig.topper: Mask.begin instead of std::begin
		// Do not convert it to vnsrl. The pattern is X -1 -1 -1 ..., while X is not
		// -1.
		if (ValidMaskEnd < 2)
		return false;
		for (ptrdiff_t i = 2; i != ValidMaskEnd; ++i)
		if (Mask[i - 1] + Difference != Mask[i])
		return false;
		return true;
		}

static bool isInterleaveShuffle(ArrayRef<int> Mask, MVT VT, bool &SwapSources,		static bool isInterleaveShuffle(ArrayRef<int> Mask, MVT VT, bool &SwapSources,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
// We need to be able to widen elements to the next larger integer type.		// We need to be able to widen elements to the next larger integer type.
if (VT.getScalarSizeInBits() >= Subtarget.getELEN())		if (VT.getScalarSizeInBits() >= Subtarget.getELEN())
return false;		return false;

int Size = Mask.size();		int Size = Mask.size();
assert(Size == (int)VT.getVectorNumElements() && "Unexpected mask size");		assert(Size == (int)VT.getVectorNumElements() && "Unexpected mask size");
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	static int isElementRotate(int &LoSrc, int &HiSrc, ArrayRef<int> Mask) {
// Check that we successfully analyzed the mask, and normalize the results.		// Check that we successfully analyzed the mask, and normalize the results.
assert(Rotation != 0 && "Failed to locate a viable rotation!");		assert(Rotation != 0 && "Failed to locate a viable rotation!");
assert((LoSrc >= 0 \|\| HiSrc >= 0) &&		assert((LoSrc >= 0 \|\| HiSrc >= 0) &&
"Failed to find a rotated input vector!");		"Failed to find a rotated input vector!");

return Rotation;		return Rotation;
}		}

// Lower the following shuffles to vnsrl.		// Lower any pattern that matches isVnsrlShuffle.
// t34: v8i8 = extract_subvector t11, Constant:i64<0>		static SDValue lowerVECTOR_SHUFFLEAsVNSRL(const SDLoc &DL, MVT VT, SDValue V1,
// t33: v8i8 = extract_subvector t11, Constant:i64<8>		SDValue V2, ArrayRef<int> Mask,
// a) t35: v8i8 = vector_shuffle<0,2,4,6,8,10,12,14> t34, t33
// b) t35: v8i8 = vector_shuffle<1,3,5,7,9,11,13,15> t34, t33
static SDValue lowerVECTOR_SHUFFLEAsVNSRL(const SDLoc &DL, MVT VT,
MVT ContainerVT, SDValue V1,
SDValue V2, SDValue TrueMask,
SDValue VL, ArrayRef<int> Mask,
const RISCVSubtarget &Subtarget,		const RISCVSubtarget &Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// Need to be able to widen the vector.		if (!isVnsrlShuffle(V1, V2, Mask, VT, Subtarget))
if (VT.getScalarSizeInBits() >= Subtarget.getELEN())
return SDValue();

// Both input must be extracts.
if (V1.getOpcode() != ISD::EXTRACT_SUBVECTOR \|\|
V2.getOpcode() != ISD::EXTRACT_SUBVECTOR)
return SDValue();		return SDValue();

// Extracting from the same source.		SDValue Src = V2.isUndef() ? V1 : V2.getOperand(0);
SDValue Src = V1.getOperand(0);		MVT SrcVT = Src.getSimpleValueType();
if (Src != V2.getOperand(0))		unsigned EltSizeInBits = VT.getScalarSizeInBits();
return SDValue();		unsigned Difference = Mask[1] - Mask[0];

// Src needs to have twice the number of elements.
if (Src.getValueType().getVectorNumElements() != (Mask.size() * 2))
craig.topperUnsubmitted Not Done Reply Inline Actions Was this check intentionally removed from the new code? craig.topper: Was this check intentionally removed from the new code?
HanKuanChenAuthorUnsubmitted Done Reply Inline Actions Was this check intentionally removed from the new code? Yes. HanKuanChen: > Was this check intentionally removed from the new code? Yes.
return SDValue();

// The extracts must extract the two halves of the source.
if (V1.getConstantOperandVal(1) != 0 \|\|
V2.getConstantOperandVal(1) != Mask.size())
return SDValue();

// First index must be the first even or odd element from V1.
if (Mask[0] != 0 && Mask[0] != 1)
return SDValue();

// The others must increase by 2 each time.
// TODO: Support undef elements?
for (unsigned i = 1; i != Mask.size(); ++i)
if (Mask[i] != Mask[i - 1] + 2)
return SDValue();

// Convert the source using a container type with twice the elements. Since
// source VT is legal and twice this VT, we know VT isn't LMUL=8 so it is
// safe to double.
MVT DoubleContainerVT =
MVT::getVectorVT(ContainerVT.getVectorElementType(),
ContainerVT.getVectorElementCount() * 2);
Src = convertToScalableVector(DoubleContainerVT, Src, DAG, Subtarget);

// Convert the vector to a wider integer type with the original element
// count. This also converts FP to int.
unsigned EltBits = ContainerVT.getScalarSizeInBits();
MVT WideIntEltVT = MVT::getIntegerVT(EltBits * 2);
MVT WideIntContainerVT =
MVT::getVectorVT(WideIntEltVT, ContainerVT.getVectorElementCount());
Src = DAG.getBitcast(WideIntContainerVT, Src);

// Convert to the integer version of the container type.
MVT IntEltVT = MVT::getIntegerVT(EltBits);
MVT IntContainerVT =
MVT::getVectorVT(IntEltVT, ContainerVT.getVectorElementCount());

// If we want even elements, then the shift amount is 0. Otherwise, shift by		// We use SrcVT.getVectorNumElements() instead of VT.getVectorNumElements()
// the original element size.		// because Src may be from a extract_subvector (which is twice longer than
unsigned Shift = Mask[0] == 0 ? 0 : EltBits;		// VT).
SDValue SplatShift = DAG.getNode(		MVT WidenVT = MVT::getVectorVT(MVT::getIntegerVT(EltSizeInBits * Difference),
RISCVISD::VMV_V_X_VL, DL, IntContainerVT, DAG.getUNDEF(ContainerVT),		SrcVT.getVectorNumElements() / Difference);
		MVT ContainerWidenVT =
		getContainerForFixedLengthVector(DAG, WidenVT, Subtarget);
		// Do bitcast first, then convert it to scalable vector.
		craig.topperUnsubmitted Not Done Reply Inline Actions Why did you change the order of bitcast and conversion? craig.topper: Why did you change the order of bitcast and conversion?
		HanKuanChenAuthorUnsubmitted Done Reply Inline Actions Why did you change the order of bitcast and conversion? Example: `Src` is `v4i32`. If we do bitcast then conversion, we get `v2i64` and `nxv1i64`. If we do conversion then bitcast, we only get `nxv1i32` because `nxv1i32` and `nxv1i64` do not have same size. HanKuanChen: > Why did you change the order of bitcast and conversion? Example: `Src` is `v4i32`. If we do…
		SDValue WidenSrc = DAG.getBitcast(WidenVT, Src);
		WidenSrc =
		convertToScalableVector(ContainerWidenVT, WidenSrc, DAG, Subtarget);

		// TODO: Some pattern has undef. We can shrink VL to get higher performance.
		auto [TrueMask, VL] =
		getDefaultVLOps(WidenVT, ContainerWidenVT, DL, DAG, Subtarget);

		for (unsigned i = Difference; i != 1; i >>= 1) {
		MVT HalfEltSizeContainerWidenVT =
		MVT::getVectorVT(MVT::getIntegerVT(EltSizeInBits * i / 2),
		WidenSrc.getSimpleValueType().getVectorElementCount());
		// If the Difference is 4, and Mask[0] is 2 (0b10). It means we have to get
		// the upper part first, then the lower part.
		unsigned Shift = (Mask[0] & (i >> 1)) ? EltSizeInBits * i / 2 : 0;
		SDValue SplatShift =
		DAG.getNode(RISCVISD::VMV_V_X_VL, DL, HalfEltSizeContainerWidenVT,
		DAG.getUNDEF(HalfEltSizeContainerWidenVT),
DAG.getConstant(Shift, DL, Subtarget.getXLenVT()), VL);		DAG.getConstant(Shift, DL, Subtarget.getXLenVT()), VL);
SDValue Res =		WidenSrc = DAG.getNode(
DAG.getNode(RISCVISD::VNSRL_VL, DL, IntContainerVT, Src, SplatShift,		RISCVISD::VNSRL_VL, DL, HalfEltSizeContainerWidenVT, WidenSrc,
DAG.getUNDEF(IntContainerVT), TrueMask, VL);		SplatShift, DAG.getUNDEF(HalfEltSizeContainerWidenVT), TrueMask, VL);
// Cast back to FP if needed.		}
Res = DAG.getBitcast(ContainerVT, Res);

return convertFromScalableVector(VT, Res, DAG, Subtarget);		// In a reverse order. Convert it to scalable vector first, then do bitcast.
		SDValue Res = convertFromScalableVector(VT.changeVectorElementTypeToInteger(),
		WidenSrc, DAG, Subtarget);
		// Cast back to FP if needed.
		return DAG.getBitcast(VT, Res);
}		}

// Lower the following shuffle to vslidedown.		// Lower the following shuffle to vslidedown.
// t49: v8i8 = extract_subvector t13, Constant:i64<0>		// t49: v8i8 = extract_subvector t13, Constant:i64<0>
// t109: v8i8 = extract_subvector t12, Constant:i64<8>		// t109: v8i8 = extract_subvector t12, Constant:i64<8>
// t108: v8i8 = vector_shuffle<1,2,3,4,5,6,7,8> t49, t106		// t108: v8i8 = vector_shuffle<1,2,3,4,5,6,7,8> t49, t106
static SDValue lowerVECTOR_SHUFFLEAsVSlidedown(const SDLoc &DL, MVT VT,		static SDValue lowerVECTOR_SHUFFLEAsVSlidedown(const SDLoc &DL, MVT VT,
SDValue V1, SDValue V2,		SDValue V1, SDValue V2,
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	if (Rotation > 0) {
}		}
if (LoV)		if (LoV)
Res = DAG.getNode(RISCVISD::VSLIDEUP_VL, DL, ContainerVT, Res, LoV,		Res = DAG.getNode(RISCVISD::VSLIDEUP_VL, DL, ContainerVT, Res, LoV,
DAG.getConstant(InvRotate, DL, XLenVT), TrueMask, VL);		DAG.getConstant(InvRotate, DL, XLenVT), TrueMask, VL);

return convertFromScalableVector(VT, Res, DAG, Subtarget);		return convertFromScalableVector(VT, Res, DAG, Subtarget);
}		}

if (SDValue V = lowerVECTOR_SHUFFLEAsVNSRL(		if (SDValue V =
DL, VT, ContainerVT, V1, V2, TrueMask, VL, Mask, Subtarget, DAG))		lowerVECTOR_SHUFFLEAsVNSRL(DL, VT, V1, V2, Mask, Subtarget, DAG))
return V;		return V;

// Detect an interleave shuffle and lower to		// Detect an interleave shuffle and lower to
// (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1))		// (vmaccu.vx (vwaddu.vx lohalf(V1), lohalf(V2)), lohalf(V2), (2^eltbits - 1))
bool SwapSources;		bool SwapSources;
if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) {		if (isInterleaveShuffle(Mask, VT, SwapSources, Subtarget)) {
// Swap sources if needed.		// Swap sources if needed.
if (SwapSources)		if (SwapSources)
▲ Show 20 Lines • Show All 10,186 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shufflevector-vnsrl.ll

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
	}			}

	define void @vnsrl_2_undef_i8(ptr %in, ptr %out) {			define void @vnsrl_2_undef_i8(ptr %in, ptr %out) {
	; CHECK-LABEL: vnsrl_2_undef_i8:			; CHECK-LABEL: vnsrl_2_undef_i8:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: vsetivli zero, 16, e8, mf2, ta, ma			; CHECK-NEXT: vsetivli zero, 16, e8, mf2, ta, ma
	; CHECK-NEXT: vle8.v v8, (a0)			; CHECK-NEXT: vle8.v v8, (a0)
	; CHECK-NEXT: vsetivli zero, 8, e8, mf4, ta, ma			; CHECK-NEXT: vsetivli zero, 8, e8, mf4, ta, ma
	; CHECK-NEXT: vid.v v9			; CHECK-NEXT: vnsrl.wi v8, v8, 8
	; CHECK-NEXT: vadd.vv v9, v9, v9			; CHECK-NEXT: vse8.v v8, (a1)
	; CHECK-NEXT: vadd.vi v10, v9, 1
	; CHECK-NEXT: vrgather.vv v11, v8, v10
	; CHECK-NEXT: li a0, 112
	; CHECK-NEXT: vmv.s.x v0, a0
	; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
	; CHECK-NEXT: vslidedown.vi v8, v8, 8
	; CHECK-NEXT: vsetivli zero, 8, e8, mf4, ta, mu
	; CHECK-NEXT: vadd.vi v9, v9, -7
	; CHECK-NEXT: vrgather.vv v11, v8, v9, v0.t
	; CHECK-NEXT: vse8.v v11, (a1)
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = load <16 x i8>, ptr %in, align 1			%0 = load <16 x i8>, ptr %in, align 1
	%1 = shufflevector <16 x i8> %0, <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 undef>			%1 = shufflevector <16 x i8> %0, <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 undef>
	store <8 x i8> %1, ptr %out, align 1			store <8 x i8> %1, ptr %out, align 1
	ret void			ret void
	}			}

	Show All 37 Lines
	}			}

	define void @vnsrl_8_undef_i8(ptr %in, ptr %out) {			define void @vnsrl_8_undef_i8(ptr %in, ptr %out) {
	; V-LABEL: vnsrl_8_undef_i8:			; V-LABEL: vnsrl_8_undef_i8:
	; V: # %bb.0: # %entry			; V: # %bb.0: # %entry
	; V-NEXT: li a2, 32			; V-NEXT: li a2, 32
	; V-NEXT: vsetvli zero, a2, e8, m1, ta, ma			; V-NEXT: vsetvli zero, a2, e8, m1, ta, ma
	; V-NEXT: vle8.v v8, (a0)			; V-NEXT: vle8.v v8, (a0)
	; V-NEXT: li a0, 2			; V-NEXT: vsetivli zero, 4, e32, mf2, ta, ma
	; V-NEXT: vsetivli zero, 1, e8, mf8, ta, ma			; V-NEXT: vnsrl.wx v8, v8, a2
	; V-NEXT: vmv.s.x v0, a0			; V-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
	; V-NEXT: vsetivli zero, 8, e8, m1, ta, ma			; V-NEXT: vnsrl.wi v8, v8, 0
	; V-NEXT: vslidedown.vi v9, v8, 8			; V-NEXT: vsetvli zero, zero, e8, mf8, ta, ma
	; V-NEXT: vsetivli zero, 8, e8, mf4, ta, mu			; V-NEXT: vnsrl.wi v8, v8, 8
	; V-NEXT: vrgather.vi v10, v8, 5			; V-NEXT: vse8.v v8, (a1)
	; V-NEXT: vrgather.vi v10, v9, 5, v0.t
	; V-NEXT: vsetivli zero, 4, e8, mf8, ta, ma
	; V-NEXT: vse8.v v10, (a1)
	; V-NEXT: ret			; V-NEXT: ret
	;			;
	; ZVE32F-LABEL: vnsrl_8_undef_i8:			; ZVE32F-LABEL: vnsrl_8_undef_i8:
	; ZVE32F: # %bb.0: # %entry			; ZVE32F: # %bb.0: # %entry
	; ZVE32F-NEXT: li a2, 32			; ZVE32F-NEXT: li a2, 32
	; ZVE32F-NEXT: vsetvli zero, a2, e8, m1, ta, ma			; ZVE32F-NEXT: vsetvli zero, a2, e8, m1, ta, ma
	; ZVE32F-NEXT: vle8.v v8, (a0)			; ZVE32F-NEXT: vle8.v v8, (a0)
	; ZVE32F-NEXT: li a0, 2			; ZVE32F-NEXT: li a0, 2
	Show All 10 Lines
	entry:			entry:
	%0 = load <32 x i8>, ptr %in, align 1			%0 = load <32 x i8>, ptr %in, align 1
	%1 = shufflevector <32 x i8> %0, <32 x i8> poison, <4 x i32> <i32 5, i32 13, i32 undef, i32 undef>			%1 = shufflevector <32 x i8> %0, <32 x i8> poison, <4 x i32> <i32 5, i32 13, i32 undef, i32 undef>
	store <4 x i8> %1, ptr %out, align 1			store <4 x i8> %1, ptr %out, align 1
	ret void			ret void
	}			}

	define void @vnsrl_2_undef_i16(ptr %in, ptr %out) {			define void @vnsrl_2_undef_i16(ptr %in, ptr %out) {
	; CHECK-LABEL: vnsrl_2_undef_i16:			; V-LABEL: vnsrl_2_undef_i16:
	; CHECK: # %bb.0: # %entry			; V: # %bb.0: # %entry
	; CHECK-NEXT: vsetivli zero, 16, e16, m1, ta, ma			; V-NEXT: vsetivli zero, 16, e16, m1, ta, ma
	; CHECK-NEXT: vle16.v v8, (a0)			; V-NEXT: vle16.v v8, (a0)
	; CHECK-NEXT: vsetivli zero, 8, e16, mf2, ta, ma			; V-NEXT: vsetivli zero, 4, e16, mf4, ta, ma
	; CHECK-NEXT: vid.v v9			; V-NEXT: vnsrl.wi v8, v8, 16
	; CHECK-NEXT: vadd.vv v9, v9, v9			; V-NEXT: vsetivli zero, 8, e16, mf2, ta, ma
	; CHECK-NEXT: vadd.vi v9, v9, 1			; V-NEXT: vse16.v v8, (a1)
	; CHECK-NEXT: vrgather.vv v10, v8, v9			; V-NEXT: ret
	; CHECK-NEXT: vse16.v v10, (a1)			;
	; CHECK-NEXT: ret			; ZVE32F-LABEL: vnsrl_2_undef_i16:
				; ZVE32F: # %bb.0: # %entry
				; ZVE32F-NEXT: vsetivli zero, 16, e16, m1, ta, ma
				; ZVE32F-NEXT: vle16.v v8, (a0)
				; ZVE32F-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
				; ZVE32F-NEXT: vnsrl.wi v8, v8, 16
				; ZVE32F-NEXT: vsetivli zero, 8, e16, mf2, ta, ma
				; ZVE32F-NEXT: vse16.v v8, (a1)
				; ZVE32F-NEXT: ret
	entry:			entry:
	%0 = load <16 x i16>, ptr %in, align 2			%0 = load <16 x i16>, ptr %in, align 2
	%1 = shufflevector <16 x i16> %0, <16 x i16> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i16> %0, <16 x i16> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	store <8 x i16> %1, ptr %out, align 2			store <8 x i16> %1, ptr %out, align 2
	ret void			ret void
	}			}

	define void @vnsrl_4_undef_i16(ptr %in, ptr %out) {			define void @vnsrl_4_undef_i16(ptr %in, ptr %out) {
	; V-LABEL: vnsrl_4_undef_i16:			; V-LABEL: vnsrl_4_undef_i16:
	; V: # %bb.0: # %entry			; V: # %bb.0: # %entry
	; V-NEXT: vsetivli zero, 16, e16, m1, ta, ma			; V-NEXT: vsetivli zero, 16, e16, m1, ta, ma
	; V-NEXT: vle16.v v8, (a0)			; V-NEXT: vle16.v v8, (a0)
	; V-NEXT: vsetivli zero, 8, e16, mf2, ta, ma			; V-NEXT: vsetivli zero, 4, e32, mf2, ta, ma
	; V-NEXT: vid.v v9			; V-NEXT: vnsrl.wi v8, v8, 0
	; V-NEXT: vsll.vi v9, v9, 2			; V-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
	; V-NEXT: vadd.vi v9, v9, 1			; V-NEXT: vnsrl.wi v8, v8, 16
	; V-NEXT: vrgather.vv v10, v8, v9			; V-NEXT: vse16.v v8, (a1)
	; V-NEXT: li a0, 4
	; V-NEXT: vmv.s.x v0, a0
	; V-NEXT: vsetivli zero, 8, e16, m1, ta, ma
	; V-NEXT: vslidedown.vi v8, v8, 8
	; V-NEXT: vsetivli zero, 8, e16, mf2, ta, mu
	; V-NEXT: vrgather.vi v10, v8, 1, v0.t
	; V-NEXT: vsetivli zero, 4, e16, mf4, ta, ma
	; V-NEXT: vse16.v v10, (a1)
	; V-NEXT: ret			; V-NEXT: ret
	;			;
	; ZVE32F-LABEL: vnsrl_4_undef_i16:			; ZVE32F-LABEL: vnsrl_4_undef_i16:
	; ZVE32F: # %bb.0: # %entry			; ZVE32F: # %bb.0: # %entry
	; ZVE32F-NEXT: vsetivli zero, 16, e16, m1, ta, ma			; ZVE32F-NEXT: vsetivli zero, 16, e16, m1, ta, ma
	; ZVE32F-NEXT: vle16.v v8, (a0)			; ZVE32F-NEXT: vle16.v v8, (a0)
	; ZVE32F-NEXT: vsetivli zero, 8, e16, mf2, ta, ma			; ZVE32F-NEXT: vsetivli zero, 8, e16, mf2, ta, ma
	; ZVE32F-NEXT: vid.v v9			; ZVE32F-NEXT: vid.v v9
	Show All 12 Lines
	entry:			entry:
	%0 = load <16 x i16>, ptr %in, align 2			%0 = load <16 x i16>, ptr %in, align 2
	%1 = shufflevector <16 x i16> %0, <16 x i16> poison, <4 x i32> <i32 1, i32 5, i32 9, i32 undef>			%1 = shufflevector <16 x i16> %0, <16 x i16> poison, <4 x i32> <i32 1, i32 5, i32 9, i32 undef>
	store <4 x i16> %1, ptr %out, align 2			store <4 x i16> %1, ptr %out, align 2
	ret void			ret void
	}			}

	define void @vnsrl_2_undef_i32(ptr %in, ptr %out) {			define void @vnsrl_2_undef_i32(ptr %in, ptr %out) {
	; CHECK-LABEL: vnsrl_2_undef_i32:			; V-LABEL: vnsrl_2_undef_i32:
	; CHECK: # %bb.0: # %entry			; V: # %bb.0: # %entry
	; CHECK-NEXT: vsetivli zero, 16, e32, m2, ta, ma			; V-NEXT: vsetivli zero, 16, e32, m2, ta, ma
	; CHECK-NEXT: vle32.v v8, (a0)			; V-NEXT: vle32.v v8, (a0)
	; CHECK-NEXT: vsetivli zero, 8, e32, m1, ta, ma			; V-NEXT: li a0, 32
	; CHECK-NEXT: vid.v v10			; V-NEXT: vsetivli zero, 4, e32, mf2, ta, ma
	; CHECK-NEXT: vadd.vv v10, v10, v10			; V-NEXT: vnsrl.wx v8, v8, a0
	; CHECK-NEXT: vadd.vi v10, v10, 1			; V-NEXT: vsetivli zero, 8, e32, m1, ta, ma
	; CHECK-NEXT: vrgather.vv v11, v8, v10			; V-NEXT: vse32.v v8, (a1)
	; CHECK-NEXT: vse32.v v11, (a1)			; V-NEXT: ret
	; CHECK-NEXT: ret			;
				; ZVE32F-LABEL: vnsrl_2_undef_i32:
				; ZVE32F: # %bb.0: # %entry
				; ZVE32F-NEXT: vsetivli zero, 16, e32, m2, ta, ma
				; ZVE32F-NEXT: vle32.v v8, (a0)
				; ZVE32F-NEXT: vsetivli zero, 8, e32, m1, ta, ma
				; ZVE32F-NEXT: vid.v v10
				; ZVE32F-NEXT: vadd.vv v10, v10, v10
				; ZVE32F-NEXT: vadd.vi v10, v10, 1
				; ZVE32F-NEXT: vrgather.vv v11, v8, v10
				; ZVE32F-NEXT: vse32.v v11, (a1)
				; ZVE32F-NEXT: ret
	entry:			entry:
	%0 = load <16 x i32>, ptr %in, align 4			%0 = load <16 x i32>, ptr %in, align 4
	%1 = shufflevector <16 x i32> %0, <16 x i32> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x i32> %0, <16 x i32> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	store <8 x i32> %1, ptr %out, align 4			store <8 x i32> %1, ptr %out, align 4
	ret void			ret void
	}			}

	define void @vnsrl_2_undef_half(ptr %in, ptr %out) {			define void @vnsrl_2_undef_half(ptr %in, ptr %out) {
	; CHECK-LABEL: vnsrl_2_undef_half:			; CHECK-LABEL: vnsrl_2_undef_half:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: vsetivli zero, 16, e16, m1, ta, ma			; CHECK-NEXT: vsetivli zero, 16, e16, m1, ta, ma
	; CHECK-NEXT: vle16.v v8, (a0)			; CHECK-NEXT: vle16.v v8, (a0)
	; CHECK-NEXT: vsetivli zero, 8, e16, mf2, ta, ma			; CHECK-NEXT: vsetivli zero, 8, e16, mf2, ta, ma
	; CHECK-NEXT: vid.v v9			; CHECK-NEXT: vnsrl.wi v8, v8, 16
	; CHECK-NEXT: vadd.vv v9, v9, v9			; CHECK-NEXT: vse16.v v8, (a1)
	; CHECK-NEXT: vadd.vi v10, v9, 1
	; CHECK-NEXT: vrgather.vv v11, v8, v10
	; CHECK-NEXT: li a0, 112
	; CHECK-NEXT: vmv.s.x v0, a0
	; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
	; CHECK-NEXT: vslidedown.vi v8, v8, 8
	; CHECK-NEXT: vsetivli zero, 8, e16, mf2, ta, mu
	; CHECK-NEXT: vadd.vi v9, v9, -7
	; CHECK-NEXT: vrgather.vv v11, v8, v9, v0.t
	; CHECK-NEXT: vse16.v v11, (a1)
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = load <16 x half>, ptr %in, align 2			%0 = load <16 x half>, ptr %in, align 2
	%1 = shufflevector <16 x half> %0, <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 undef>			%1 = shufflevector <16 x half> %0, <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 undef>
	store <8 x half> %1, ptr %out, align 2			store <8 x half> %1, ptr %out, align 2
	ret void			ret void
	}			}

	Show All 32 Lines
	entry:			entry:
	%0 = load <16 x half>, ptr %in, align 2			%0 = load <16 x half>, ptr %in, align 2
	%1 = shufflevector <16 x half> %0, <16 x half> poison, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>			%1 = shufflevector <16 x half> %0, <16 x half> poison, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>
	store <4 x half> %1, ptr %out, align 2			store <4 x half> %1, ptr %out, align 2
	ret void			ret void
	}			}

	define void @vnsrl_2_undef_float(ptr %in, ptr %out) {			define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
	; CHECK-LABEL: vnsrl_2_undef_float:			; V-LABEL: vnsrl_2_undef_float:
	; CHECK: # %bb.0: # %entry			; V: # %bb.0: # %entry
	; CHECK-NEXT: vsetivli zero, 16, e32, m2, ta, ma			; V-NEXT: vsetivli zero, 16, e32, m2, ta, ma
	; CHECK-NEXT: vle32.v v8, (a0)			; V-NEXT: vle32.v v8, (a0)
	; CHECK-NEXT: vsetivli zero, 8, e32, m1, ta, ma			; V-NEXT: li a0, 32
	; CHECK-NEXT: vid.v v10			; V-NEXT: vsetivli zero, 4, e32, mf2, ta, ma
	; CHECK-NEXT: vadd.vv v10, v10, v10			; V-NEXT: vnsrl.wx v8, v8, a0
	; CHECK-NEXT: vadd.vi v10, v10, 1			; V-NEXT: vsetivli zero, 8, e32, m1, ta, ma
	; CHECK-NEXT: vrgather.vv v11, v8, v10			; V-NEXT: vse32.v v8, (a1)
	; CHECK-NEXT: vse32.v v11, (a1)			; V-NEXT: ret
	; CHECK-NEXT: ret			;
				; ZVE32F-LABEL: vnsrl_2_undef_float:
				; ZVE32F: # %bb.0: # %entry
				; ZVE32F-NEXT: vsetivli zero, 16, e32, m2, ta, ma
				; ZVE32F-NEXT: vle32.v v8, (a0)
				; ZVE32F-NEXT: vsetivli zero, 8, e32, m1, ta, ma
				; ZVE32F-NEXT: vid.v v10
				; ZVE32F-NEXT: vadd.vv v10, v10, v10
				; ZVE32F-NEXT: vadd.vi v10, v10, 1
				; ZVE32F-NEXT: vrgather.vv v11, v8, v10
				; ZVE32F-NEXT: vse32.v v11, (a1)
				; ZVE32F-NEXT: ret
	entry:			entry:
	%0 = load <16 x float>, ptr %in, align 4			%0 = load <16 x float>, ptr %in, align 4
	%1 = shufflevector <16 x float> %0, <16 x float> poison, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <16 x float> %0, <16 x float> poison, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	store <8 x float> %1, ptr %out, align 4			store <8 x float> %1, ptr %out, align 4
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Make lowerVECTOR_SHUFFLEAsVNSRL support more vnsrl shuffle pattern.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 474207

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shufflevector-vnsrl.ll

[RISCV] Make lowerVECTOR_SHUFFLEAsVNSRL support more vnsrl shuffle pattern.
Needs ReviewPublic