This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Improve shuffle select shuffle-of-shuffles
ClosedPublic

Authored by dmgreen on Jun 28 2022, 7:13 AM.

Details

Summary

This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like:

%x = shuffle %i1, %i2
%y = shuffle %i1, %i2
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in:

%x = shuffle %i1, %i2
%y = shuffle %x

The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 28 2022, 7:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 7:13 AM
dmgreen requested review of this revision.Jun 28 2022, 7:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 7:13 AM
samtebbs added inline comments.Jul 1 2022, 6:24 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1421–1429

Do you think it would be worth making an anonymous function for this similar behaviour?

1501–1523

This has the same structure as GetBaseMaskValue, so I wonder if GetMaskValue can use this. If it makes it messier then having both is OK.

llvm/test/Transforms/VectorCombine/AArch64/select-shuffle.ll
578

Looks like more instructions are added here. Is that fine because it gets lowered to something better or of lower cost?

dmgreen updated this revision to Diff 442017.Jul 4 2022, 12:15 AM
dmgreen marked an inline comment as done.

Updates for review comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1501–1523

Hmm. Good point, but I'm not sure how easy it is for them to share logic, unfortunately. They are each returning quite different values, and the parameters are different between the two methods.

llvm/test/Transforms/VectorCombine/AArch64/select-shuffle.ll
578

Yeah - More instructions is better in this case, as the shuffles are each simpler. It can be hard to see where this is an improvement and where it isn't from the tests. In general it can be a little difficult to be very precise, but most of the changes are improvements (and this one especially is much better as the shuffles are a lot easier to materialize).

samtebbs accepted this revision.Jul 4 2022, 2:44 AM

LGTM now, cheers.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1501–1523

Yeah looks like it would be messier to do that. Looks good to me as it is then.

This revision is now accepted and ready to land.Jul 4 2022, 2:44 AM
This revision was landed with ongoing or failed builds.Jul 4 2022, 5:39 AM
This revision was automatically updated to reflect the committed changes.

Hi!

We get a memory corruption issue in our downstream testing with this patch. I've attached a reproducer, which can be run with opt -passes=vector-combine reduced.ll:

The issue occurs in X86TTIImpl::getShuffleCost:

unsigned E = *NumOfDests.getValue();
unsigned NormalizedVF =
    LegalVT.getVectorNumElements() * std::max(NumOfSrcs, E);
unsigned NumOfSrcRegs = NormalizedVF / LegalVT.getVectorNumElements();
unsigned NumOfDestRegs = NormalizedVF / LegalVT.getVectorNumElements();
SmallVector<int> NormalizedMask(NormalizedVF, UndefMaskElem);
copy(Mask, NormalizedMask.begin());

Mask has a size of 64, but NormalizedVF is only 16, so the copy tramples over the heap.

Thanks for the reproducer - I'll take a look now.

There is hopefully a fix in rG4b7913c35733. Let me know if anything else shows up.