Page MenuHomePhabricator

[X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161)

Authored by RKSimon on Oct 11 2018, 10:24 AM.



Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute.

This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161. There is not much test coverage for other shuffles that might benefit yet.

We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead.

Diff Detail


Event Timeline

RKSimon created this revision.Oct 11 2018, 10:24 AM
craig.topper added inline comments.Oct 12 2018, 12:53 PM

Why do LaneMask and PermMask have the same actual size, but different "small" size?

RKSimon added inline comments.Oct 12 2018, 1:26 PM

copy+paste has failed me again....

RKSimon updated this revision to Diff 169484.Oct 12 2018, 1:29 PM

Fixed SmallVector<> LaneMask

This revision is now accepted and ready to land.Oct 12 2018, 1:35 PM
This revision was automatically updated to reflect the committed changes.