This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve reordering of clustered reused scalars.
ClosedPublic

Authored by ABataev on Sep 8 2022, 2:02 PM.

Details

Summary

If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of the clustered
vectors.

Diff Detail

Event Timeline

ABataev created this revision.Sep 8 2022, 2:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2022, 2:02 PM
ABataev requested review of this revision.Sep 8 2022, 2:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2022, 2:02 PM
RKSimon edited the summary of this revision. (Show Details)Sep 9 2022, 5:25 AM
RKSimon added inline comments.Sep 9 2022, 5:33 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3690

Please can you pull the UsedIndices mask check out into a helper (isOneUseSingleSourceMask()?) - I'd like to get this moved into ShuffleVectorInst so its with all the other static shuffle mask kind matchers as this has potential uses elsewhere. Its up to you whether you put it there yourself in this patch, or just hoist it out as a helper in SLP for now.

I'm intending to add proper unit tests for the shuffle mask matches soon so I can do it then if necessary

BTW - are there any other shuffle mask matchers or manipulation patterns you think we could pull from SLP and move to somewhere more generic?

BTW - are there any other shuffle mask matchers or manipulation patterns you think we could pull from SLP and move to somewhere more generic?

I think yes, but need some time to find all those patterns and move them to a common place.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3690

Ok, will do.

ABataev updated this revision to Diff 459059.Sep 9 2022, 7:30 AM

Address comments

RKSimon added inline comments.Sep 9 2022, 9:35 AM
llvm/lib/IR/Instructions.cpp
2574

Is it going to be a problem to refactor this back to the old bitmask approach to allow VF == 1? I can think of cases where we're going to want to match this to <1,0,2,3> style permutes without any clustering.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3891

Pull the ternary operator out - this should make the code easier to grok.

vdmitrie added inline comments.Sep 9 2022, 9:38 AM
llvm/lib/IR/Instructions.cpp
2577

I wonder why you departed from BitVector approach you used in the previous revision?
The above statement is true. But what you are doing here is applying the opposite statement
(if sum is VF * (VF-1)/2 then each index is used once in each submask).

If we have mask "0222 0123" and VF==4 the sum will be 6 in both submasks.

ABataev added inline comments.Sep 9 2022, 9:48 AM
llvm/lib/IR/Instructions.cpp
2577

Yeah, you're right, will use bitmask.

ABataev updated this revision to Diff 459122.Sep 9 2022, 10:38 AM

Address comments.

vdmitrie accepted this revision.Sep 9 2022, 1:04 PM

Looks good with a nit.

llvm/lib/IR/Instructions.cpp
2578

The comment can be removed.

This revision is now accepted and ready to land.Sep 9 2022, 1:04 PM
This revision was automatically updated to reflect the committed changes.