This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs.
ClosedPublic

Authored by RKSimon on Jul 27 2016, 6:23 AM.

Details

Summary

We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero).

This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask.

We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering.

This will allow us to combine to 2-input shuffles in a future patch.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 65719.Jul 27 2016, 6:23 AM
RKSimon retitled this revision from to [X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs..
RKSimon updated this object.
RKSimon added reviewers: chandlerc, delena, ab, spatel, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
andreadb edited edge metadata.Aug 3 2016, 4:07 AM

Hi Simon,

Overall the patch lgtm.
My major concern was related to the complexity of your new algorithm. 'combineX86ShufflesRecursively' is now called for every "used" shuffle operand. Have you noticed any particular cases where this function is now expensive?
However, given that we already limit the maximum recursion depth to 8, I don't expect this to be much of an issue.

-Andrea

andreadb accepted this revision.Aug 3 2016, 4:07 AM
andreadb edited edge metadata.
This revision is now accepted and ready to land.Aug 3 2016, 4:07 AM

Hi Simon,

Overall the patch lgtm.
My major concern was related to the complexity of your new algorithm. 'combineX86ShufflesRecursively' is now called for every "used" shuffle operand. Have you noticed any particular cases where this function is now expensive?
However, given that we already limit the maximum recursion depth to 8, I don't expect this to be much of an issue.

-Andrea

Thanks Andrea.

I haven't found anything to indicate that the search has become notably more costly to builds - between the limits on depth and hasOneUse(), many shuffles already early out. If anybody can create a repro then I can investigate ways to manipulate the max search depth but this hasn't proved to be necessary yet.

This revision was automatically updated to reflect the committed changes.