This is an archive of the discontinued LLVM Phabricator instance.

[LV] Add support for vectorizing predicated strided accesses using masked interleave-group
ClosedPublic

Authored by dorit on Oct 9 2018, 12:40 AM.

Details

Summary

The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles.

Diff Detail

Event Timeline

dorit created this revision.Oct 9 2018, 12:40 AM
Ayal added inline comments.Oct 9 2018, 4:05 AM
include/llvm/Analysis/VectorUtils.h
131

"each of the elements in a vector of \p VF elements" >> “each element in a vector of \p VF elements”, or “each of the \p VF elements in a vector”.

Indicate “Interleaving” in the name of the method, given the InterleaveFactor argument?

lib/Analysis/VectorUtils.cpp
822

Below might look simpler using BlockA = A->getParent(); BlockB = B->getParent();

lib/Target/X86/X86TargetTransformInfo.cpp
2598

(Unrelated clang-format change)

lib/Transforms/Vectorize/LoopVectorize.cpp
416

" optionally" >> "optionally"

1985

May be slightly better to check (!BlockInMask || !Group->isReverse()); or fold under the "if (IsMaskRequired)" below.

2041

Above may look simpler by first setting

auto *Undefs = UndefValue::get(Mask->getType());
auto *TransposedMask = createTransposedMask(Builder, InterleaveFactor, VF);
2043

"wide.masked.load" >> "wide.masked.vec", to follow "wide.vec" more closely?

2120

See above simplification comment.

4304

Better check only if isLegalMaskedLoad : isLegalMaskedStore. The purpose of optimizing interleaved accesses is to improve upon the use of Gather/Scatter.

7108

Simpler to fold into

IAI.analyzeInterleaving(useMaskedInterleavedAccesses(*TTI));
dorit updated this revision to Diff 168784.Oct 9 2018, 5:30 AM

Addressed Ayal's comments. Thanks!

dorit marked 10 inline comments as done.Oct 9 2018, 5:34 AM
dorit added inline comments.
include/llvm/Analysis/VectorUtils.h
131

createInterleavedMask is already taken, but all we do is replicate each of the elements a given number of times, so I went with createReplicatedMask.

lib/Transforms/Vectorize/LoopVectorize.cpp
4304

Right! Thanks.

Ayal added inline comments.Oct 9 2018, 6:06 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
2038

One could set Undefs and RepMask once outside the "for Part" loop, using Mask[0]->getType(), and also reuse them for interleaved store group. But doing so conditional on IsMaskRequired is a bit less appealing.

2042

Use UndefVec instead of UndefValue::get(VecTy)

dorit updated this revision to Diff 169154.Oct 10 2018, 11:34 PM
dorit marked 2 inline comments as done.
dorit marked an inline comment as done.Oct 10 2018, 11:37 PM

Comment addressed, thanks.

lib/Transforms/Vectorize/LoopVectorize.cpp
2038

Agreed. Left it as is.

Ayal accepted this revision.Oct 11 2018, 5:14 AM

LGTM; just a minor comment trying to keep unrelated NFC changes away.

lib/Transforms/Vectorize/LoopVectorize.cpp
7109

Better leave the enclosing {} [to a separate NFC patch if desired]

This revision is now accepted and ready to land.Oct 11 2018, 5:14 AM
This revision was automatically updated to reflect the committed changes.