Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects).
Details
Diff Detail
Event Timeline
Looking at tests next.
| include/llvm/Analysis/TargetTransformInfo.h | ||
|---|---|---|
| 833 | Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks. | |
| include/llvm/Analysis/VectorUtils.h | ||
| 129 | "masks away gaps" >> "filters the members" | |
| 139 | createBitMaskForGaps or createBinaryMaskForGaps ? | |
| 425 | Check if !EnabledMaskedInterleave before calling invalidateGroups...()? | |
| lib/Analysis/VectorUtils.cpp | ||
| 520 | Could first "peel" to build a mask for all members, then replicate it VF-1 times. Not sure it's any better. | |
| lib/Transforms/Vectorize/LoopVectorize.cpp | ||
| 1960 | More logical to reverse the condition? Admittedly this is only being moved here. | |
Also added a test with stride 3.
Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.
| include/llvm/Analysis/VectorUtils.h | ||
|---|---|---|
| 423 | The "if there is no other means..." is now part of the "This can happen when". I.e., | |
| include/llvm/CodeGen/BasicTTIImpl.h | ||
| 831 | This scaling works just as well for gap-masked loads, right? | |
| 902 | Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks. | |
| lib/Analysis/VectorUtils.cpp | ||
| 960–961 | "or masking ..." >> "and cannot be masked (not enabled)." | |
| lib/Transforms/Vectorize/LoopVectorize.cpp | ||
| 1243 | "Under optsize" and when the trip count is very small "we don't ..." | |
| 2077–2093 | Rename IsMaskRequired to [Is]MaskForCondRequired? | |
| 2084 | Rename ShuffledMask to MaskForCond? | |
| 4393 | "costModel" >> "cost model", or "CostModel" | |
| 4649 | It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch. | |
| test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll | ||
| 89 | Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above? | |
Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks.