This is an archive of the discontinued LLVM Phabricator instance.

[LV] Sink casts to unravel first order recurrence
ClosedPublic

Authored by Ayal on May 10 2017, 12:40 PM.

Details

Summary

A loop-phi Phi that involves a first order recurrence can be vectorized only if the Previous instruction that feeds Phi from the previous iteration dominates all of Phi's users. This condition is needed to allow introducing instructions that use both Phi and Previous, combine their values, and feed the users of Phi and Previous.

When Phi has users of types different from its own, a mediating Cast instruction that uses Phi may appear above Previous, thereby breaking this condition. This patch tries to sink such a Cast past Previous, provided the users of Cast are dominated by Previous, thereby enabling vectorization of the loop.

Diff Detail

Repository
rL LLVM

Event Timeline

Ayal created this revision.May 10 2017, 12:40 PM
mssimpso edited edge metadata.May 10 2017, 2:41 PM

Hi Ayal,

Thanks for extending this. The current dominance requirement can be fairly restrictive. Is there anything special about casts? I'm wondering if it's worth extending this to handle other kinds of instructions/expressions.

In general I wonder if this is really the best place to do this. It would be nice if the loop was canonicalised to be in this form given how cheap it is to do. Perhaps LoopSimplify? Not blocking this change, but something to think about.

lib/Transforms/Utils/LoopUtils.cpp
556 ↗(On Diff #98502)

In what situation can this occur? Isn't this an output vector?

564 ↗(On Diff #98502)

Bug here, bitwise & used. Which is also telling me that there aren't sufficient tests for this, especially the case where Previous doesn't dominate the cast's user and so the cast can't be legally sunk past Previous.

test/Transforms/LoopVectorize/first-order-recurrence.ll
401 ↗(On Diff #98502)

Can we add a few more checks to verify the result of the vectorization is sane?

Ayal added a comment.May 11 2017, 7:34 AM

Hi Ayal,

Thanks for extending this. The current dominance requirement can be fairly restrictive. Is there anything special about casts? I'm wondering if it's worth extending this to handle other kinds of instructions/expressions.

The case of casts arises naturally when GVN::PerformLoadPRE() optimizes an a[i]+a[i+1] where "a" is an array e.g. of type short and the addition is of type int, as in the attached test-case. It is also possible though to apply this optimization manually to the source code, by doing:

short ai = a[0];
for (int i = 0; i < n; ++i) {
  short aiPlusOne = a[i+1];
  b[i] = ai * aiPlusOne;
  ai = aiPlusOne;
}

or manually add any instruction to use Phi before Previous, such as the following "+= 3":

int ai = a[0];
for (int i = 0; i < n; ++i) {
  ai += 3;
  int aiPlusOne = a[i+1];
  b[i] = ai * aiPlusOne;
  ai = aiPlusOne;
}

(Doing "b[i]=(a[i]+3)+a[i+1]" btw produces a cast followed by an add, both above Previous; a test-case @aemerson asked for.)

In general I wonder if this is really the best place to do this. It would be nice if the loop was canonicalised to be in this form given how cheap it is to do. Perhaps LoopSimplify? Not blocking this change, but something to think about.

The specific a[i]+a[i+1] case could indeed be handled before vectorization, namely by (LoopSimplify?) hoisting the cast along with the load. This applies in general to other instructions that operate symmetrically on both a[i] and a[i+1]. Instructions that only apply to a[i] need to be carefully placed inside the loop in order for it to be vectorized. It's probably better to have the vectorizer handle such motion; nothing keeps these instructions from returning back to their original positions.

Sounds reasonable?

lib/Transforms/Utils/LoopUtils.cpp
556 ↗(On Diff #98502)

In general, once instructions are allowed to move, their updated location and dominance info should be considered. Furthermore, if transitive motion is allowed, e.g., sink a after b and sink b after c, they need to execute in the right order.

In the scenarios encountered Previous is a load, so this does not occur, as sinking is restricted here to casts.

Note that Interleave Groups also move instructions, but w/o affecting such sinking of casts: loads move upwards (w/o breaking their original dominance info) and stores move downwards (having no users).

564 ↗(On Diff #98502)

Right, thanks!

test/Transforms/LoopVectorize/first-order-recurrence.ll
401 ↗(On Diff #98502)

Sure.

In D33058#752230, @Ayal wrote:

The specific a[i]+a[i+1] case could indeed be handled before vectorization, namely by (LoopSimplify?) hoisting the cast along with the load. This applies in general to other instructions that operate symmetrically on both a[i] and a[i+1]. Instructions that only apply to a[i] need to be carefully placed inside the loop in order for it to be vectorized. It's probably better to have the vectorizer handle such motion; nothing keeps these instructions from returning back to their original positions.

Sounds reasonable?

Yep, thanks.

Ayal updated this revision to Diff 102313.Jun 13 2017, 4:48 AM
Ayal marked an inline comment as done.

In D33058#751569, @mssimpso wrote:
I'm wondering if it's worth extending this to handle other kinds of instructions/expressions.

Updated version captures this wondering in a TODO, along with addressing the other review comments.

Ayal added inline comments.Jun 13 2017, 4:52 AM
lib/Transforms/Utils/LoopUtils.cpp
564 ↗(On Diff #98502)

Fixed. (Missing '&' fell off column 80 I guess ;-)

Added a test as mentioned above where Previous (the load) doesn't dominate the cast's user, so the cast cannot be legally sunk past Previous w/o sinking this user along with it.

LGTM, assuming Amara has no more comments. Thanks Ayal!

aemerson accepted this revision.Jun 26 2017, 3:09 AM

Sorry, this fell off my radar. LGTM.

This revision is now accepted and ready to land.Jun 26 2017, 3:09 AM
This revision was automatically updated to reflect the committed changes.