This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Fold shuffle select pattern
ClosedPublic

Authored by dmgreen on Apr 17 2022, 5:54 AM.

Details

Summary

This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:

%x = shuffle ...
%y = shuffle ...
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask

A classic select-mask will pick items from each lane of a or b. These
do not always have a great lowering on many architectures. This patch
attempt to pack a and b into the lower elements, creating a different
shuffle for reconstructing the orignal which may be better than the
select mask. This can be better for performance, especially if less
elements of a and b need to be computed and the input shuffles are
cheaper.

Because select-masks are just one form of shuffle, we generalize to any
mask. So long as the backend has decent costmodel for the shuffles, this
can generally improve things when they come up. For more basic cost
models, the folds do not appear to be profitable, not getting past the
cost checks.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 17 2022, 5:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2022, 5:54 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
dmgreen requested review of this revision.Apr 17 2022, 5:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2022, 5:54 AM

Are there remarks for the VectorCombine pass?

dmgreen updated this revision to Diff 423836.Apr 20 2022, 1:30 AM

Update formatting and fix instruction flags TODO.

dmgreen added a comment.EditedApr 20 2022, 1:30 AM

Are there remarks for the VectorCombine pass?

There does not appear to be any, no. The pass doesn't currently seem to know anything about remarks.

samtebbs added inline comments.Apr 28 2022, 2:33 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1238

nit of -> if

1279–1281

I'm not exactly sure what is meant by "we try to sort the first vector elements to the beginning, and the second array elements to the end". Does it mean sorting e.g. shuffle <9, 4, 11, 12, 3> to shuffle<3, 4, 9, 11, 12>?

How does that then allow us to only use half of the binops? The number of binary operations in the output seems to remain the same. This is just for my own understanding, not because I think it's wrong.

dmgreen marked an inline comment as done.May 3 2022, 7:21 AM
dmgreen added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1279–1281

Yeah I don't feel like that was written very well. I've tried to update it a little.

The idea is to take a shuffle of the form shuffle A, B <0, 8, 2, 3, 12, 13, 6, 15> and turn that into a shuffle that only uses the first 4 lanes (0,1,2,3) from A and the first 4 lanes from B (8,9,10,11). We need to recreate the original, so we create a reconstruction mask of shuffle A', B' <0, 8, 1, 2, 9, 10, 3, 11>. The shuffles into A and B are altered to keep the lanes valid, and the whole thing is costed to make sure the total cost of the new shuffles is lower than the originals.
If A and B are <8 x i32>, for example, then we only need the first <4 x i32> from each, cutting the number of operations down from 2 to 1 for each of the binops. Depending on the cost of the shuffles, this can be better overall.

dmgreen updated this revision to Diff 426697.May 3 2022, 7:22 AM

Update some of the comments, and adjust the removal of nodes to fix some sanitizer issues this was having.

samtebbs accepted this revision.May 3 2022, 8:13 AM

LGTM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1279–1281

That clears it all up, thanks :)

This revision is now accepted and ready to land.May 3 2022, 8:13 AM

OK Thanks.

I've tried some examples on a few different architectures, and it seems to either not do anything or produce better code (or at least smaller code). It can be a bit dependant on a precise costmodel though, so let me know if any issues come up.

This revision was landed with ongoing or failed builds.May 6 2022, 12:13 AM
This revision was automatically updated to reflect the committed changes.