LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more.
Details
- Reviewers
spatel davidxl jmolloy hfinkel - Commits
- rGa73ffa220611: [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring…
rGe6a210f50b67: [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring…
rG7f8b43d42478: [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring…
rL255691: [LoopVectorizer] Refine loop vectorizer's register usage calculator by…
rL255460: [LoopVectorizer] Refine loop vectorizer's register usage calculator by…
rL255454: [LoopVectorizer] Refine loop vectorizer's register usage calculator by…
Diff Detail
Event Timeline
Hi Cong,
Thanks for doing this. I have a bunch of comments below.
Cheers,
James
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
1513 | Everything else has a (pointless) comment - let's add a (pointless) comment here for consistency's sake. | |
5172 | Why this change? | |
5658 | This is not the right way to determine which user (if any) updates the PHI. In fact, it could be a chain of users, and they could be in any order. The right way is to check the incoming value of the induction PHI from the loop latch. | |
5670 | Redefining this variable is confusing - it should be given a unique name. | |
5683 | I can't help but feel this is all a bit long-winded. Something like this should work and is a lot shorter (if you don't suffer from STL allergies): for (auto &Induction : *Legal->getInductionVars()) { auto *PN = KV.first; auto *UpdateV = PN->getIncomingValueForBlock(L->getLoopLatch()); // Check that the PHI is only used by the induction increment (UpdateV) or // by GEPs. // Then, check that UpdateV is only used by a compare instruction or the loop // header PHI. if (std::all_of(PN->user_begin(), PN->user_end(), [&](const User *U) { return U == UpdateV || isa<GetElementPtrInst>(U); }) && std::all_of(UpdateV->user_begin(), UpdateV->user_end(), [&](const User *U) { return U == PN || isa<ICmpInst>(U); })) { ValuesToIgnore.insert(PN); ValuesToIgnore.insert(UpdateV); } } | |
5690 | Surely this isn't right - GEPs can be vectorized if they have pointer type, right? as in scatter gather? | |
test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll | ||
19 | I don't understand why your changes should affect the VF being selected. It should affect the UF only, right? |
Thank you very much for the review, James! Please check my inline replies.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
1513 | OK. | |
5172 | This is actually fixing a previous bug: we need to update OpenIntervals by removing instructions that end at the current location, even when we ignore values in ValuesToIgnore. Those instructions to remove won't be considered later. | |
5683 | Thank you very much for the suggested code. I love STL. I have applied your code in this patch. | |
5690 | I have added the check and now we only ignore this instruction if its last operand is an induction var. Do you have better implementation suggestion here? BTW, does LLVM support scatter/gather now? It seems that vectorizeBlockInLoop doesn't vectorize this operation. | |
test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll | ||
19 | With the patch http://reviews.llvm.org/D8943, VF is also determined by register usage when -vectorizer-maximize-bandwidth is turned on (as is in this test file). |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5636 | You need to have a different behavior here for VF == 1 vs. VF > 1. The VF == 1 case is important because it is used to control the interleave factor (calculateRegisterUsage is called from LoopVectorizationCostModel::selectInterleaveCount). For the VF == 1 case, a number of these exclusions don't apply. | |
5649 | Aren't these also used by add or sub by a constant? | |
5677 | Typo: inductoin | |
5680 | dyn_cast -> cast (it is known not to fail here) | |
5682 | I suspect you want to use getGEPInductionOperand (from include/llvm/Analysis/VectorUtils.h) to pick the operand index so that you ignore trailing zero operands. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5636 | Good catch! I have created a new set to store values to ignore when VF > 1 and only add my newly added values to ignore to this new set. This set is then used to calculate register usages for VF > 1. | |
5649 | Yes, induction variables can participate in arithmetic operations. But I still have to follow the def-use chain and check if the value that is defined directly/indirectly is stored (it seems that the reduction with an induction variable cannot be vectorized now). Do you have any idea except doing it in a DFS manner? | |
5682 | I think getGEPInductionOperand is what I want here. I have updated this part. Thanks! |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5645 | Shouldn't this be VecValuesToIgnore? These casts are not vectorized, but they are scalar instructions that maintain a live value in a register. (Is this the only contributor to ValuesToIgnore?) |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5645 | OK. This is actually from previous code that I didn't change, which is used in all cases no matter what VF is. Updated. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5645 | Now the only contributor are ephemeral values that are added above. |
Everything else has a (pointless) comment - let's add a (pointless) comment here for consistency's sake.