Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 38497 Build 38496: arc lint + arc unit
Event Timeline
Someone more familiar with SLP should have a look at the diffs, but we need to address the compile-time question.
The artificial limit is only there to guard against excessive compile-time cost, so do you have data to show that difference? Or does this patch solve the (potential) problem in another way?
More discussion here:
https://bugs.llvm.org/show_bug.cgi?id=17170
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5328–5329 | The std::max() is not a typo? Might warrant a comment. |
Just some theoretical thoughts. The complexity of the original implementation is O(n), while the complexity of the updated patch is O(n * log n);
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5328–5329 | Yes, it is a typo, must be min, thanks! |
lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5328–5329 | Then you probably want to add a test :) |
test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll | ||
---|---|---|
68 | This change is surprising, given how slow v2i64 ADD/SUB are on SLM |
Please can you rebase to see if my SLM cost changes fixes the saturated arithmetic changes?
test/Transforms/SLPVectorizer/X86/cast.ll | ||
---|---|---|
22 | Please check this - it looks superfluous (and is under an unused prefix) |
test/Transforms/SLPVectorizer/X86/pr35497.ll | ||
---|---|---|
2 | Why the duplicated -slp-vectorizer? |
test/Transforms/SLPVectorizer/X86/pr35497.ll | ||
---|---|---|
2 | Need to run the pass twice to get the code in pr35497 fully vectorized. |
The std::max() is not a typo? Might warrant a comment.