This is an archive of the discontinued LLVM Phabricator instance.

Fix maximum legal VF calculation
ClosedPublic

Authored by alonkom on Sep 6 2017, 12:34 AM.

Details

Summary

This patch fixes pr34283, which exposed that the computation of maximum legal width for vectorization was wrong, because it relied on MaxInterleaveFactor to obtain the maximum stride used in the loop, however not all strided accesses in the loop have an interleave-group associated with them. In pr34283 the interleave group gets invalidated because it is a store with gaps; as a result MaxInterleaveFactor is 1, and the computed maximum legal width for vectorization comes out larger than it should be.
Instead of recording the maximum stride in the loop, which can be over conservative (e.g. if the access with the maximum stride is not involved in the dependence limitation), this patch tracks the actual maximum legal width imposed by accesses that are involved in dependencies.

Diff Detail

Repository
rL LLVM

Event Timeline

alonkom created this revision.Sep 6 2017, 12:34 AM
dorit edited edge metadata.Sep 9 2017, 11:43 PM

Looks fine to me (with a couple very minor dbg reports improvements).
@Ayal/@hfinkel - what do you say?

lib/Transforms/Vectorize/LoopVectorize.cpp
6281 ↗(On Diff #113962)

Not strictly related to your patch, but since you are already adding a missing "Invalidate candidate interleave group" message above for the case that shows up in PR34283 (thanks!) , might as well also add a similar message that's missing here (where we invalidate a candidate interleave group due to a reverse access with gaps).

6378 ↗(On Diff #113962)

Another small nit not related to your patch, but while we're here: would be nice to make this message a bit clearer, e.g. "... Widest register safe to use is..."

Ayal added inline comments.Sep 10 2017, 12:40 AM
include/llvm/Analysis/LoopAccessAnalysis.h
205 ↗(On Diff #113962)

Should MaxSafeRegisterWidth replace MaxSafeDepDistBytes, or are both needed?

265 ↗(On Diff #113962)

that causes the worse[worst] restriction

>> that is most restrictive.

lib/Transforms/Vectorize/LoopVectorize.cpp
6226 ↗(On Diff #113962)

Can simplify: "LV: Invalidate candidate interleaved store group due to gaps.\n"

6363 ↗(On Diff #113962)

This comment about being fairly conservative (i.e., could be room for improvement) still holds, right?

6366 ↗(On Diff #113962)

So another way of fixing this could be to have Legal record the MaxStride, instead of asking it for the MaxInterleaveFactor here?

6367 ↗(On Diff #113962)

that incurs the largest restriction

that is most restrictive

test/Transforms/LoopVectorize/X86/pr34283-1.ll
1 ↗(On Diff #113962)

No need to require asserts here, right?

Better place the test say in memdep.ll, rather than opening a new file? Following TestingGuide.html#writing-new-regression-tests. In any case, place both tests in same file, using the PR number in the function names instead of foo.

25 ↗(On Diff #113962)

Can we also/instead check the runtime guard that is generated, perhaps where M is not a compile-time constant?

26 ↗(On Diff #113962)

min[i]mum

test/Transforms/LoopVectorize/X86/pr34283-2.ll
1 ↗(On Diff #113962)

The "2>&1" is not needed here.

25 ↗(On Diff #113962)

This is a bit fragile, as it's relying on the cost-model to choose exactly VF=4. Perhaps better to simply CHECK for a vector whose VF is in the range 2-7, and no vectors of VF 8 or larger?

zvi added a subscriber: zvi.Sep 10 2017, 12:44 AM
alonkom added inline comments.Sep 10 2017, 6:40 AM
include/llvm/Analysis/LoopAccessAnalysis.h
205 ↗(On Diff #113962)

I think we need both. MaxSafeDepDistBytes is used in a few other LAA functions, and is correct.

lib/Transforms/Vectorize/LoopVectorize.cpp
6363 ↗(On Diff #113962)

We are actually now dropping this over-conservativeness, which was also not conservative enough.

6366 ↗(On Diff #113962)

yes, this was one of the possible solutions we thought of, but it is less accurate.

Furthermore we can also drop getMaxInterleaveFactor() as it becomes dead.

test/Transforms/LoopVectorize/X86/pr34283-1.ll
1 ↗(On Diff #113962)

The asserts are needed for checking the debug output, but we can check the IR instead, as in the second test.

25 ↗(On Diff #113962)

This fix involves compile time distances that affect max VF; no runtime guards involved.

test/Transforms/LoopVectorize/X86/pr34283-2.ll
25 ↗(On Diff #113962)

Can force-vectorization-width=4 and CHECK that it vectorized, and force-vectorization-width=8 and CHECK that it did not.

alonkom updated this revision to Diff 114535.Sep 10 2017, 11:38 PM
Ayal edited edge metadata.Sep 11 2017, 12:14 AM

Only additional comment is whether the test needs to be X86/skx specific, or whether it can be placed in, say, memdep.ll

alonkom updated this revision to Diff 114571.Sep 11 2017, 5:13 AM

The test is now placed in memdep.ll

Ayal accepted this revision.Sep 12 2017, 9:05 AM
Ayal added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
1562 ↗(On Diff #114571)

Best assert that LAI is not null before dereferencing it.

6222 ↗(On Diff #114571)

While you're touching this, best use std::min.

test/Transforms/LoopVectorize/memdep.ll
230 ↗(On Diff #114571)

Better clarify:
; where the dependence distance here is 6 iterations, given by |N-(N-12)|/2.

This revision is now accepted and ready to land.Sep 12 2017, 9:05 AM
This revision was automatically updated to reflect the committed changes.