Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects).
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Looking at tests next.
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
833 ↗ | (On Diff #170962) | Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks. |
include/llvm/Analysis/VectorUtils.h | ||
129 ↗ | (On Diff #170962) | "masks away gaps" >> "filters the members" |
139 ↗ | (On Diff #170962) | createBitMaskForGaps or createBinaryMaskForGaps ? |
425 ↗ | (On Diff #170962) | Check if !EnabledMaskedInterleave before calling invalidateGroups...()? |
lib/Analysis/VectorUtils.cpp | ||
520 ↗ | (On Diff #170962) | Could first "peel" to build a mask for all members, then replicate it VF-1 times. Not sure it's any better. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
1960 ↗ | (On Diff #170962) | More logical to reverse the condition? Admittedly this is only being moved here. |
Comments addressed. Thanks!
lib/Analysis/VectorUtils.cpp | ||
---|---|---|
520 ↗ | (On Diff #170962) | agreed. left as is. |
Also added a test with stride 3.
Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.
include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
425 ↗ | (On Diff #171493) | The "if there is no other means..." is now part of the "This can happen when". I.e., |
include/llvm/CodeGen/BasicTTIImpl.h | ||
850 ↗ | (On Diff #171493) | This scaling works just as well for gap-masked loads, right? |
921 ↗ | (On Diff #171493) | Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks. |
lib/Analysis/VectorUtils.cpp | ||
959 ↗ | (On Diff #171493) | "or masking ..." >> "and cannot be masked (not enabled)." |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
4649 ↗ | (On Diff #170962) | It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch. |
1243 ↗ | (On Diff #171493) | "Under optsize" and when the trip count is very small "we don't ..." |
2077 ↗ | (On Diff #171493) | Rename IsMaskRequired to [Is]MaskForCondRequired? |
2084 ↗ | (On Diff #171493) | Rename ShuffledMask to MaskForCond? |
4393 ↗ | (On Diff #171493) | "costModel" >> "cost model", or "CostModel" |
test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll | ||
89 ↗ | (On Diff #170962) | Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above? |