Page MenuHomePhabricator

[RISCV] Lower insert subvector shuffles as vslideups

Authored by luke on Mar 23 2023, 4:13 AM.



A shuffle with an insert subvector mask is functionally equivalent to:

(insert_subvector v0, (extract_subvector v1, len), index)

We can emulate by doing a vslideup on v1 into the right index, and carefully selecting VL so that we don't overwrite any more destination elements than what we have to.
This avoids the need for a select with a mask.

Diff Detail

Event Timeline

luke created this revision.Mar 23 2023, 4:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 4:13 AM
luke requested review of this revision.Mar 23 2023, 4:13 AM

I think you're going one step too far with this patch. Not in the sense that this patch isn't justified - only in the sense you should do a simpler case first.

As highlighted in your test diffs, if you have a suffix of the result vector coming from the second source operand, you can do the operation with a single slideup.

I think you should match this case explicitly above the generic gather/select lowering. Doing so simplifies the test deltas to only the cases where the select is actually needed.

The other option is to add a DAG combine which removes the select afterwards, but I think matching up front is probably the right answer here.


This can be done with a single slide up. You don't need the select here at all.

reames requested changes to this revision.Mar 23 2023, 1:56 PM
This revision now requires changes to proceed.Mar 23 2023, 1:56 PM
reames added inline comments.Mar 23 2023, 2:14 PM

I don't have a concrete example, but thinking about it, this seems suspect. You're computing the select condition vector using the elements at the original offsets, but actually selecting between elements which have been slide up or down. That doesn't seem right, we should be having to adjust the select mask too.

luke updated this revision to Diff 508055.Mar 24 2023, 5:09 AM

Rework to only handle inserts of subvectors, and lower them into single vslideups

luke retitled this revision from [RISCV] Lower shuffles as vmerge + vslide{up,down} where possible to [RISCV] Lower insert subvector shuffles as vslideups.Mar 24 2023, 5:10 AM
luke edited the summary of this revision. (Show Details)
luke added inline comments.Mar 24 2023, 6:54 AM

So rd=x0, rs1=x0 means to keep the existing VL. Could that mean these two vsetivli's could be merged into one vsetivli zero, 2, e32, m1, tu, ma, seeing as the second tu trumps the former ta?

luke marked an inline comment as done.Mar 24 2023, 6:54 AM
reames accepted this revision.Mar 24 2023, 7:53 AM

LGTM w/minor comments.

I'm explicitly okay with the TU/TA issue being handled in a follow up commit.


white space on the ", tu," bit


In the case where you're inserting a suffix into the InPlace vector, you don't need the TU here and can use TA instead. (e.g. you slide up into the last 4 elements of a 8 long vector)

We may already catch that via a DAG combine or during vsetvli insertion, not sure.


Please land the test rename separately.

I'd suggest something more generic like concat_4xi8 to avoid the strategy used needing a test rename.


This is the TA case I'd mentioned above, looks like we don't catch this.


Yes, though we can also just do better and use a single slidedown to form the shuffle here.

Seems like a reasonable follow up patch.

This revision is now accepted and ready to land.Mar 24 2023, 7:53 AM
luke updated this revision to Diff 508115.Mar 24 2023, 9:01 AM

Use a tail agnostic policy if we're inserting at a suffix

luke updated this revision to Diff 508118.Mar 24 2023, 9:03 AM

Remove test rename diff

Looking at some code which benefits from this change, I noticed an opportunity for a follow up.

vsetivli        zero, 4, e8, mf8, ta, ma
vle8.v  v8, (a0)
vle8.v  v10, (a4)
vsetivli        zero, 8, e8, mf2, tu, ma     v8, v10, 4

In the snippet above, the does not need to be tail undisturbed. All of the tail efforts are undef.

Not sure if this makes sense to handle during lowering, or as a general dag combine on when the destination has undef elements. If we do it as a DAG combine, moving the suffix logic from this patch into the same place probably makes sense.