Page MenuHomePhabricator

[LV] Avoid vectorizing unsafe dependencies in uniform address

Authored by anna on Nov 14 2018, 12:43 PM.



Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.
We cannot vectorize loops that load from a uniform address where in the
previous iteration we stored to the same uniform address.

Two test cases are added to show safe and unsafe dependencies for
vectorization. Fixes PR39653.

Note: Initially tried setting CanVecMem to false and returning from the
LAA analysis, when we came across unsafe dependencies. However, this
broke some tests in LoopVersioningLICM. So, the patch changes the
HasMultipleStoresToInvariantAddress to
HasNonVectorizableStoresToInvariantAddress. This allows ORE to
state the reason we could not vectorize the loop.

Diff Detail


Event Timeline

anna created this revision.Nov 14 2018, 12:43 PM
efriedma added inline comments.Nov 14 2018, 1:14 PM
1927 ↗(On Diff #174086)

I'm not sure the dominance check is sufficient. Yes, if the store dominates the load, it's theoretically possible to vectorize the loop, but only if vectorizer has a special case to forward the stored values to the load. Otherwise, the load will get values from the wrong loop iteration.

Ayal added inline comments.Nov 14 2018, 1:41 PM
567 ↗(On Diff #174086)

Comment is pretty obvious; better explain when stores to an invariant address are considered non-vectorizable.

1878 ↗(On Diff #174086)

Is UniformStores still needed? Can check instead UniformStoreMap.count(Ptr) here (and below)?

1926 ↗(On Diff #174086)

Can check instead UniformStoreMap.count(Ptr)?

Note that checking !HasNonVectorizableStoresToLoopInvariantAddress is redundant, but could save time.

1927 ↗(On Diff #174086)

Ahh, right! No special forwarding provided. Only the last of the VF stores to same address will survive, and it will feed all VF loads.

As @anna originally intended: if an invariant address is both stored-to and loaded-from, inside the loop, bail out.

So can continue to use the ValueSet UniformStores;

555 ↗(On Diff #174086)

Good to mention this test is related to pr39653.

With -force-vector-width=4 the test can be simplified; it probably took the additional instructions to convince LV's cost-model to vectorize the original loop on its own, w/o -force-vector-width.

anna added inline comments.Nov 15 2018, 6:32 AM
1927 ↗(On Diff #174086)

oh yes, that's right. I'll resurrect the older one.

anna planned changes to this revision.Nov 15 2018, 6:50 AM

based on above comments.

anna updated this revision to Diff 174216.Nov 15 2018, 7:59 AM

addressed review comments

anna marked 2 inline comments as done.Nov 15 2018, 8:00 AM
anna added inline comments.
555 ↗(On Diff #174086)

I'd tried simplifying the test with force-vector-width, but it still avoided vectorization stating "unsafe memory operations".

Ayal accepted this revision.Nov 18 2018, 11:08 PM

LGTM, with minor optional comments. Would be good to add a test where the load is preceded by a store in the loop, indicating that such dependence is also non-vectorizable.

569 ↗(On Diff #174216)

Can simply state that if there's a memory dependence involving an invariant address, i.e., two stores or a store and a load, return true, else return false.

Could alternatively name it hasDependenceInvolvingLoopInvariantAddress

Note that a load from an invariant address that depends on store(s) to it in the loop, should ideally be promoted to use the temporary being stored; the temporary value gets expanded, unlike the store to the invariant address. The missed ORE message could indicate such potential resolution.

1919 ↗(On Diff #174216)

Can alternatively use if (UniformStores.count(Ptr))

819 ↗(On Diff #174216)

No need for {}

568 ↗(On Diff #174216)

(Note: load could be replaced by phi(%arg4, %tmp12), a potentially vectorizable 1st-order-recurrence.)

This revision is now accepted and ready to land.Nov 18 2018, 11:08 PM
anna marked 5 inline comments as done.Nov 19 2018, 7:17 AM
anna added inline comments.
569 ↗(On Diff #174216)

addressed first 2 comments. Regarding the third one, the entirety of stores to invariant address or store-followed-by-load to inv address is a missed optimization not handled before vectorization, i.e. scalar promotion and LICM.

568 ↗(On Diff #174216)

good point! I'll add that as a comment.

This revision was automatically updated to reflect the committed changes.