Vectorizing loops with "escaping" IVs has been disabled since it was discovered to not work correctly (PR17179).
This patch re-enables it, with support for external use of both "pre-increment" and "post-increment" (that is, last and second-to-last iteration) IVs.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3301 ↗ | (On Diff #59811) | I think you're right. |
4806 ↗ | (On Diff #59811) | Yes, of course, I didn't notice I removed the only false path. |
An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".
It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3299–3300 | need[s] "simplest" as it employs II.transform, which takes care of pointers as well; one could argue that doing EndValue - Step is simpler.. |
Yes, in fact, that's what I've started with, but abandoned that direction.
This is a question of cost modeling. SimplifyIndVar will already clean this up if it considers generating the end value cheap enough. And it seems like this decision should not depend on whether it expects vectorization in the future or not.
It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.
I'm not sure it's much better. If the LCSSA phi uses the IV phi directly, it is. If it uses the value feeding into the IV phi, then we still need to find the IV this value belongs to. So, either have additional book-keeping, or go over the value's uses to find the phi.
If you think it may be significantly better, I can implement it, and see how it looks.
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3299–3300 | Yes, I want this to fire for pointer IVs as well, just enabling it step-by-step. |
It could potentially help other uses as well, but ok, they'd be hard to anticipate as well. The alternative was firstly referring to cleaning this up SimplifyIndVar-style after we decide to vectorize the loop and before we start creating an empty loop etc.
It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.
I'm not sure it's much better. If the LCSSA phi uses the IV phi directly, it is. If it uses the value feeding into the IV phi, then we still need to find the IV this value belongs to. So, either have additional book-keeping, or go over the value's uses to find the phi.
or find the phi by looking at the defs feeding this value.
If you think it may be significantly better, I can implement it, and see how it looks.
Ah, I would expect this to have negligible effect if any. Just noted to keep in mind if one does go back to implement the SimplifyIndVar alternative.
llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3299–3300 | BTW, another alternative is to extract the last element from the vectorized IV; or the element before last. But that is less amenable to further passes than the scalar computation. |
need[s]
"simplest" as it employs II.transform, which takes care of pointers as well; one could argue that doing EndValue - Step is simpler..