If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of the clustered
vectors.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3690 | Please can you pull the UsedIndices mask check out into a helper (isOneUseSingleSourceMask()?) - I'd like to get this moved into ShuffleVectorInst so its with all the other static shuffle mask kind matchers as this has potential uses elsewhere. Its up to you whether you put it there yourself in this patch, or just hoist it out as a helper in SLP for now. I'm intending to add proper unit tests for the shuffle mask matches soon so I can do it then if necessary |
BTW - are there any other shuffle mask matchers or manipulation patterns you think we could pull from SLP and move to somewhere more generic?
I think yes, but need some time to find all those patterns and move them to a common place.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
3690 | Ok, will do. |
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2573 | Is it going to be a problem to refactor this back to the old bitmask approach to allow VF == 1? I can think of cases where we're going to want to match this to <1,0,2,3> style permutes without any clustering. | |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
3892 | Pull the ternary operator out - this should make the code easier to grok. |
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2576 | I wonder why you departed from BitVector approach you used in the previous revision? If we have mask "0222 0123" and VF==4 the sum will be 6 in both submasks. |
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2576 | Yeah, you're right, will use bitmask. |
Is it going to be a problem to refactor this back to the old bitmask approach to allow VF == 1? I can think of cases where we're going to want to match this to <1,0,2,3> style permutes without any clustering.