Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects).
Details
Diff Detail
Event Timeline
Looking at tests next.
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
833–835 | Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks. | |
include/llvm/Analysis/VectorUtils.h | ||
129 | "masks away gaps" >> "filters the members" | |
139 | createBitMaskForGaps or createBinaryMaskForGaps ? | |
429 | Check if !EnabledMaskedInterleave before calling invalidateGroups...()? | |
lib/Analysis/VectorUtils.cpp | ||
520 | Could first "peel" to build a mask for all members, then replicate it VF-1 times. Not sure it's any better. | |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
1960 | More logical to reverse the condition? Admittedly this is only being moved here. |
Also added a test with stride 3.
Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.
include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
425 | The "if there is no other means..." is now part of the "This can happen when". I.e., | |
include/llvm/CodeGen/BasicTTIImpl.h | ||
850 | This scaling works just as well for gap-masked loads, right? | |
921 | Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks. | |
lib/Analysis/VectorUtils.cpp | ||
959 | "or masking ..." >> "and cannot be masked (not enabled)." | |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
1243 | "Under optsize" and when the trip count is very small "we don't ..." | |
2077 | Rename IsMaskRequired to [Is]MaskForCondRequired? | |
2084 | Rename ShuffledMask to MaskForCond? | |
4393 | "costModel" >> "cost model", or "CostModel" | |
4649 | It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch. | |
test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll | ||
89 | Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above? |
Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks.