This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Make lowerVECTOR_SHUFFLEAsVNSRL support more vnsrl shuffle pattern.
Needs ReviewPublic

Authored by HanKuanChen on Nov 9 2022, 2:15 AM.

Details

Summary

Current lowerVECTOR_SHUFFLEAsVNSRL cannot support trailing undef and
other vnsrl pattern (e.g., an arithmetic sequence with 4 difference).
This commit will support power of 2 difference and use an iterative way
to split vector_shuffle into a series of vnsrl.

Some pattern (e.g., vnsrl_4_undef_i8) does not use vector_shuffle
because general DAG combiner cannot make build_vector into
vector_shuffle.

Also, isVnsrlShuffle is also provided. I expect isShuffleMaskLegal can
call isVnsrlShuffle in the future.

Diff Detail

Event Timeline

HanKuanChen created this revision.Nov 9 2022, 2:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 2:15 AM
HanKuanChen requested review of this revision.Nov 9 2022, 2:16 AM

Fix pre-merge checks.

craig.topper added inline comments.Nov 10 2022, 10:22 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2761

What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks block that.

I guess maybe it's handled by the std::any_of later?

I'd feel better if we checked Mask[0] and Mask[1] are >= 0 before the subtract.

2774

You can use auto here.

2775

Mask.end() instead of std::end

2778

Mask.begin instead of std::begin

HanKuanChen added inline comments.Nov 10 2022, 10:26 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2761

What if Mask[0] is -1 and Mask[1] is 1. The different will be 2. I don't think these checks block that.

I guess maybe it's handled by the std::any_of later?

I'd feel better if we checked Mask[0] and Mask[1] are >= 0 before the subtract.

How about we move find(Mask, -1) and std::any_of to front?

Apply Craig's comment.

craig.topper added inline comments.Nov 13 2022, 4:37 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2943

Why did you change the order of bitcast and conversion?

HanKuanChen added inline comments.Nov 13 2022, 8:22 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2943

Why did you change the order of bitcast and conversion?

Example: Src is v4i32.
If we do bitcast then conversion, we get v2i64 and nxv1i64.
If we do conversion then bitcast, we only get nxv1i32 because nxv1i32 and nxv1i64 do not have same size.

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}
craig.topper added inline comments.Nov 13 2022, 9:08 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2873

Was this check intentionally removed from the new code?

HanKuanChen added inline comments.Nov 13 2022, 9:09 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2873

Was this check intentionally removed from the new code?

Yes.

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32

I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32

I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.

Am I correct in thinking that all of the tests are using LMUL=1 or fractional LMUL?

Am I correct in thinking that all of the tests are using LMUL=1 or fractional LMUL?

Yes.

HanKuanChen added a comment.EditedNov 13 2022, 11:21 PM

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32

I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.

May you provide the test?
I try <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> and <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>, but I don't get the crash if getSingleShuffleSource is used (and check the source and destination type).

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32

I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.

May you provide the test?
I try <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> and <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>, but I don't get the crash if getSingleShuffleSource is used (and check the source and destination type).

I was using Zvl128b not Zvl256b. Does that make a difference?

This test crashes

define void @vnsrl_2_undef_float(ptr %in, ptr %out) {
entry:
  %0 = load <32 x float>, ptr %in, align 4
  %1 = shufflevector <32 x float> %0, <32 x float> poison, <16 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x float> %1, ptr %out, align 4
  ret void
}

I will merge and close https://reviews.llvm.org/D137904 to solve this test.

How does D137904 solve this? That patch is marked NFC

t7: v32f32,ch = load<(load (s1024) from %ir.in, align 4)> t0, t2, undef:i64
t9: v16f32 = extract_subvector t7, Constant:i64<0>
t11: v16f32 = vector_shuffle<1,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t9, undef:v16f32

I expect we use getSingleShuffleSource to get t7 instead of t9. Then we check whether the type of source and destination can match vnsrl rule.
This test has a 2 Difference, and source is v32f32 and destination is v16f32, which is valid.

I don't think that will fix it. The same test crashes if you change the shuffle mask to <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>. That cases should have two extract_subvectors.

May you provide the test?
I try <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> and <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>, but I don't get the crash if getSingleShuffleSource is used (and check the source and destination type).

I was using Zvl128b not Zvl256b. Does that make a difference?

No. But let me submit the new version so that we can test it.

Fix vector_shuffle source may be extract_subvector and undef.

Also provide getSingleShuffleSource (D137904).