Try to use 2 * MaxElts size of vectors for stores vectorization. This commit
is motivated by effect of bugfixing at reviews.llvm.org/D93192 and tries
to compensate it.
There could be the case, for instance, when cost of pair of <4 x float>
vectorization is zero, but vectorization of <8 x float> is beneficial however.
LLVM vector with 2 * MaxElts cannot be lowered to one register, of course, it is splitted
to two registers.
We try to check 2 * MaxElts after MaxElts not to interfere the ordinary vectorization
which could be accepted as beneficial itself.
Details
- Reviewers
RKSimon ABataev dtemirbulatov
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
70 ms | x64 windows > LLVM.CodeGen/XCore::threads.ll |
Event Timeline
llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
---|---|---|
174 | please can you cleanup all these checks ? |
llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
---|---|---|
174 | Fixed this line in test or did you mean to precommit check prefixes? |
llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
---|---|---|
174 | We seem to have AVX and AVX1 check prefixes now - go back and replace the check-prefixes=AVX with check-prefixes=AVX1 (not sure if we can have a common AVX for AVX1 + AVX2)? |
llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
---|---|---|
174 | Oh, I see. Done. |
llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll | ||
---|---|---|
183 | Hmm, yes, you're right, that's strange to generate <4 x i64> for the case with preferable width (=128). But we can't check this at the abstract llvm level. Generally we don't know the target constraints, so this my patch looks too tricky for such cases. |
Due to what said above, I'm to abandon this change. It looks like over-optimization, breaking llvm IR middle-end abstraction.
please can you cleanup all these checks ?