This is an archive of the discontinued LLVM Phabricator instance.

[LV] Enable vectorization of loops where the IV has an external use
ClosedPublic

Authored by mkuper on Jun 6 2016, 5:32 PM.

Details

Summary

Vectorizing loops with "escaping" IVs has been disabled since it was discovered to not work correctly (PR17179).
This patch re-enables it, with support for external use of both "pre-increment" and "post-increment" (that is, last and second-to-last iteration) IVs.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 59811.Jun 6 2016, 5:32 PM
mkuper retitled this revision from to [LV] Enable vectorization of loops where the IV has an external use.
mkuper updated this object.
mkuper added reviewers: delena, hfinkel, jmolloy, wmi, mssimpso.
mkuper added subscribers: nadav, danielcdh, davidxl, llvm-commits.
wmi added inline comments.Jun 14 2016, 11:36 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3301 ↗(On Diff #59811)

Why PrevValue is necessary? In which case OrigPhi->users() can have more than one use outside loop?

4806 ↗(On Diff #59811)

addInductionPhi will return true anyway now. So maybe change its return val to void and remove the if?

mkuper added inline comments.Jun 14 2016, 12:38 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
3301 ↗(On Diff #59811)

I think you're right.
LCSSA should canonicalize this to one phi per exit block, and we only vectorize loops with a single exit block right now. I'll change it, thanks!

4806 ↗(On Diff #59811)

Yes, of course, I didn't notice I removed the only false path.

mkuper updated this revision to Diff 60731.Jun 14 2016, 1:04 PM

Updated per Wei's comments.

wmi accepted this revision.Jun 14 2016, 1:53 PM
wmi edited edge metadata.
This revision is now accepted and ready to land.Jun 14 2016, 1:53 PM
This revision was automatically updated to reflect the committed changes.
Ayal added a subscriber: Ayal.Jun 15 2016, 3:26 PM

An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".

It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
3299–3300

need[s]

"simplest" as it employs II.transform, which takes care of pointers as well; one could argue that doing EndValue - Step is simpler..

In D21048#459394, @Ayal wrote:

An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".

Yes, in fact, that's what I've started with, but abandoned that direction.
This is a question of cost modeling. SimplifyIndVar will already clean this up if it considers generating the end value cheap enough. And it seems like this decision should not depend on whether it expects vectorization in the future or not.

It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.

I'm not sure it's much better. If the LCSSA phi uses the IV phi directly, it is. If it uses the value feeding into the IV phi, then we still need to find the IV this value belongs to. So, either have additional book-keeping, or go over the value's uses to find the phi.
If you think it may be significantly better, I can implement it, and see how it looks.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
3299–3300

Yes, I want this to fire for pointer IVs as well, just enabling it step-by-step.

Ayal added a comment.Jun 15 2016, 4:50 PM
In D21048#459394, @Ayal wrote:

An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".

Yes, in fact, that's what I've started with, but abandoned that direction.
This is a question of cost modeling. SimplifyIndVar will already clean this up if it considers generating the end value cheap enough. And it seems like this decision should not depend on whether it expects vectorization in the future or not.

It could potentially help other uses as well, but ok, they'd be hard to anticipate as well. The alternative was firstly referring to cleaning this up SimplifyIndVar-style after we decide to vectorize the loop and before we start creating an empty loop etc.

It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.

I'm not sure it's much better. If the LCSSA phi uses the IV phi directly, it is. If it uses the value feeding into the IV phi, then we still need to find the IV this value belongs to. So, either have additional book-keeping, or go over the value's uses to find the phi.

or find the phi by looking at the defs feeding this value.

If you think it may be significantly better, I can implement it, and see how it looks.

Ah, I would expect this to have negligible effect if any. Just noted to keep in mind if one does go back to implement the SimplifyIndVar alternative.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
3299–3300

BTW, another alternative is to extract the last element from the vectorized IV; or the element before last. But that is less amenable to further passes than the scalar computation.