This is an archive of the discontinued LLVM Phabricator instance.

[LV] Consider non-consecutive vectorizable accesses in max VF selection
ClosedPublic

Authored by mssimpso on Feb 23 2017, 10:33 AM.

Details

Summary

When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized.

Diff Detail

Repository
rL LLVM

Event Timeline

mssimpso created this revision.Feb 23 2017, 10:33 AM
mkuper added inline comments.Feb 28 2017, 2:46 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗(On Diff #89536)

Why would this only apply to pointer types, though? What's special about them?
(It looks like it was a heuristic of some sort, but I'm not sure it makes sense anymore.)

mssimpso added inline comments.Feb 28 2017, 3:06 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗(On Diff #89536)

I'm not sure it makes sense anymore either - I'm happy to remove it. It was added when we could only choose the VF based on the size of the largest type. Maybe the pointer size was just used in place of "a large scalar type size that will cause the max VF to be too small"?

I'm actually hoping we can enable -vectorizer-maximize-bandwidth at some point, though. For some context, once I commit D29466 and D29675, ARM/AArch64 should be prepared for the change. I discovered the bug here while testing those two patches (with -vectorizer-maximize-bandwidth=false). I was expecting these patches to be NFC, but for the loop in the test case, we were choosing a very large VF by mistake.

mkuper edited edge metadata.Feb 28 2017, 3:17 PM

LGTM

lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗(On Diff #89536)

Hm, right, if we have a loop that mostly works on i8, but gathers the pointers, we'll have a bad time with the MaxVF.
ILet's just keep ignoring it for now... :-)

And I'd really like to enable maximize-bandwidth as well. I need to run our tests again and see whether we have any regressions on x86.

mssimpso added inline comments.Feb 28 2017, 3:25 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗(On Diff #89536)

Sounds good. Thanks, Michael!

This revision was automatically updated to reflect the committed changes.