This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use v(f)slide1up for shuffle+insert idiom
ClosedPublic

Authored by reames on May 25 2023, 9:58 AM.

Details

Summary

This is pretty straight forward in the basic form. I did need to move the slideup matching earlier, but that looks generally profitable on it's own.

As follow ups, I plan to explore the v(f)slide1down variants, and see what I can do to canonicalize the shuffle then insert pattern (see _inverse tests at the end of the vslide1up.ll test).

Diff Detail

Event Timeline

reames created this revision.May 25 2023, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 9:58 AM
reames requested review of this revision.May 25 2023, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 9:58 AM
reames added inline comments.May 25 2023, 9:59 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vector-shuffle-vslide1up.ll
45

I do need to figure out why these aren't matching. My guess is that we're canonicalizing to a bitcast somewhere, need to track that down. The delta is an improvement even without the vslide1up match, so I think this can be a separate change.

All of the tests are for two-element vectors. Is it worth adding further testing?

I played around with some other examples; should all of these also be lowered to this pattern?

; This one matches
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
%v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>
; I wouldn't have thought the splat index should matter
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
%v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 1, i32 4, i32 5, i32 6>
; Should be the same as the first example, but doesn't match
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v2 = shufflevector <4 x i8> %vb, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>

I'm admittedly a bit out of the loop and the shuffle stuff is the first to go, so I may be missing something obvious here, sorry.

reames updated this revision to Diff 526075.May 26 2023, 8:30 AM

Rebase over corrected tests, and fix one bug. I'd forgotten to check that the entire tail was being written. We could handle this via the pass through argument, but since we'd have to materialize the splat source it isn't profitable. Instead, just limit the transform to avoid that case. (see neg1 test)

All of the tests are for two-element vectors. Is it worth adding further testing?

Actually, there were test for 4 x E vectors, I'd just gotten the shuffle mask wrong on every single one of them. :( Fixed, and rebased, so the tests should make a lot more sense now.

I played around with some other examples; should all of these also be lowered to this pattern?

; This one matches
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
%v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>
; I wouldn't have thought the splat index should matter
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
%v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 1, i32 4, i32 5, i32 6>

This is an artifact of where I put the matching code. This isn't a sub-vector insert, and I'd placed the vslide1up match below that check. Good catch. If you're okay with it, I'd like to land this as is, and then handle this in a separate patch. I need to write dedicated matching for the vslide1down case anyways, and I'd like to have the test coverage in tree already.

; Should be the same as the first example, but doesn't match
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v2 = shufflevector <4 x i8> %vb, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>

This looks identical to the new vslide1up_4xi8_swapped test. It does match on RV64, are you maybe testing this on RV32? There's something a bit odd going on with that one; we're widening the op type on the insert. I have not looked into that one yet.

frasercrmck accepted this revision.May 28 2023, 2:43 AM

LGTM with a minor typo

; This one matches
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
%v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>
; I wouldn't have thought the splat index should matter
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
%v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 1, i32 4, i32 5, i32 6>

This is an artifact of where I put the matching code. This isn't a sub-vector insert, and I'd placed the vslide1up match below that check. Good catch. If you're okay with it, I'd like to land this as is, and then handle this in a separate patch. I need to write dedicated matching for the vslide1down case anyways, and I'd like to have the test coverage in tree already.

Yes that sounds good.

; Should be the same as the first example, but doesn't match
%vb = insertelement <4 x i8> poison, i8 %b, i64 0
%v2 = shufflevector <4 x i8> %vb, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>

This looks identical to the new vslide1up_4xi8_swapped test. It does match on RV64, are you maybe testing this on RV32? There's something a bit odd going on with that one; we're widening the op type on the insert. I have not looked into that one yet.

Yes good spot - I was testing only with RV32, which I don't normally do when playing around with things.

llvm/test/CodeGen/RISCV/rvv/fixed-vector-shuffle-vslide1up.ll
273

typo: materialize

This revision is now accepted and ready to land.May 28 2023, 2:43 AM
This revision was landed with ongoing or failed builds.May 30 2023, 7:37 AM
This revision was automatically updated to reflect the committed changes.