Try to use 2 * MaxElts size of vectors for stores vectorization. This commit
is motivated by effect of bugfixing at reviews.llvm.org/D93192 and tries
to compensate it.
There could be the case, for instance, when cost of pair of <4 x float>
vectorization is zero, but vectorization of <8 x float> is beneficial however.
LLVM vector with 2 * MaxElts cannot be lowered to one register, of course, it is splitted
to two registers.
We try to check 2 * MaxElts after MaxElts not to interfere the ordinary vectorization
which could be accepted as beneficial itself.
Details
- Reviewers
- RKSimon - ABataev - dtemirbulatov 
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
| Time | Test | |
|---|---|---|
| 110 ms | x64 windows > LLVM.CodeGen/XCore::threads.ll | 
Event Timeline
| llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
|---|---|---|
| 136–146 | please can you cleanup all these checks ? | |
| llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
|---|---|---|
| 136–146 | Fixed this line in test or did you mean to precommit check prefixes? | |
| llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
|---|---|---|
| 136–146 | We seem to have AVX and AVX1 check prefixes now - go back and replace the check-prefixes=AVX with check-prefixes=AVX1 (not sure if we can have a common AVX for AVX1 + AVX2)? | |
| llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
|---|---|---|
| 136–146 | Oh, I see. Done. | |
| llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll | ||
|---|---|---|
| 183 | Hmm, yes, you're right, that's strange to generate <4 x i64> for the case with preferable width (=128). But we can't check this at the abstract llvm level. Generally we don't know the target constraints, so this my patch looks too tricky for such cases. | |
Due to what said above, I'm to abandon this change. It looks like over-optimization, breaking llvm IR middle-end abstraction.
please can you cleanup all these checks ?