Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 15691 Build 15691: arc lint + arc unit
Event Timeline
Someone more familiar with SLP should have a look at the diffs, but we need to address the compile-time question.
The artificial limit is only there to guard against excessive compile-time cost, so do you have data to show that difference? Or does this patch solve the (potential) problem in another way?
More discussion here:
https://bugs.llvm.org/show_bug.cgi?id=17170
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4598–4601 | The std::max() is not a typo? Might warrant a comment. |
Just some theoretical thoughts. The complexity of the original implementation is O(n), while the complexity of the updated patch is O(n * log n);
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4598–4601 | Yes, it is a typo, must be min, thanks! |
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4598–4601 | Then you probably want to add a test :) |
test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
---|---|---|
68 ↗ | (On Diff #221573) | This change is surprising, given how slow v2i64 ADD/SUB are on SLM |
Please can you rebase to see if my SLM cost changes fixes the saturated arithmetic changes?
test/Transforms/SLPVectorizer/X86/cast.ll | ||
---|---|---|
22 | Please check this - it looks superfluous (and is under an unused prefix) |
test/Transforms/SLPVectorizer/X86/pr35497.ll | ||
---|---|---|
2 ↗ | (On Diff #221937) | Why the duplicated -slp-vectorizer? |
test/Transforms/SLPVectorizer/X86/pr35497.ll | ||
---|---|---|
2 ↗ | (On Diff #221937) | Need to run the pass twice to get the code in pr35497 fully vectorized. |
Better to use SmallBitVector or APInt ?