For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Please note we still vectorize epilog with VF 8 while the remainder number of iterations is known to be 1. That should be addressed separately though..
llvm/test/Transforms/LoopVectorize/X86/known_TC.ll | ||
---|---|---|
6–7 | Yep, missed that... |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6085 | Shouldn't this be PowerOf2Floor(ConstTripCount)? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6085 | I think this is saying "the trip count is "ConstTripCount", so we are rounding that down to a power of 2 (left as an exercise to the reader)." It may be better to be clearer about the values it will use though. Having what it thinks the original ConstTripCount printed somewhere does sound useful. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6085 |
While I agree this is valid interpretation I did mean to provide resulting value. So the printing PowerOf2Floor(ConstTripCount) here sounds reasonable to me. Do you think we should change wording?
It is printed out at the beginning of computeFeasibleMaxVF. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6085 |
I think the wording is fine when printing ClampedConstTripCount. As you said, the original trip count is printed earlier. It could also be included in the message (known trip count: ...) or something similar, but I am not sure if it adds a lot of additional info. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6085 | IIUC there was some consensus to use ClampedConstTripCount in the message and maybe include ConstTripCount in addition to that. But it looks like the committed version has not been updated accordingly and just prints the ConstTripCount. Did I miss anything? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6085 | You didn't... that was my overlook.. will fix shortly. Thanks! |
foldTailByMasking -> FoldTailByMasking