The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
131 ↗ | (On Diff #168749) | "each of the elements in a vector of \p VF elements" >> “each element in a vector of \p VF elements”, or “each of the \p VF elements in a vector”. Indicate “Interleaving” in the name of the method, given the InterleaveFactor argument? |
lib/Analysis/VectorUtils.cpp | ||
822 ↗ | (On Diff #168749) | Below might look simpler using BlockA = A->getParent(); BlockB = B->getParent(); |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
2598 ↗ | (On Diff #168749) | (Unrelated clang-format change) |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
416 ↗ | (On Diff #168749) | " optionally" >> "optionally" |
1985 ↗ | (On Diff #168749) | May be slightly better to check (!BlockInMask || !Group->isReverse()); or fold under the "if (IsMaskRequired)" below. |
2041 ↗ | (On Diff #168749) | Above may look simpler by first setting auto *Undefs = UndefValue::get(Mask->getType()); auto *TransposedMask = createTransposedMask(Builder, InterleaveFactor, VF); |
2043 ↗ | (On Diff #168749) | "wide.masked.load" >> "wide.masked.vec", to follow "wide.vec" more closely? |
2120 ↗ | (On Diff #168749) | See above simplification comment. |
4304 ↗ | (On Diff #168749) | Better check only if isLegalMaskedLoad : isLegalMaskedStore. The purpose of optimizing interleaved accesses is to improve upon the use of Gather/Scatter. |
7108 ↗ | (On Diff #168749) | Simpler to fold into IAI.analyzeInterleaving(useMaskedInterleavedAccesses(*TTI)); |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2038 ↗ | (On Diff #168784) | One could set Undefs and RepMask once outside the "for Part" loop, using Mask[0]->getType(), and also reuse them for interleaved store group. But doing so conditional on IsMaskRequired is a bit less appealing. |
2042 ↗ | (On Diff #168784) | Use UndefVec instead of UndefValue::get(VecTy) |
Comment addressed, thanks.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2038 ↗ | (On Diff #168784) | Agreed. Left it as is. |
LGTM; just a minor comment trying to keep unrelated NFC changes away.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7104 ↗ | (On Diff #169154) | Better leave the enclosing {} [to a separate NFC patch if desired] |