Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects).
Looking at tests next.
|833 ↗||(On Diff #170962)|
Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks.
|129 ↗||(On Diff #170962)|
"masks away gaps" >> "filters the members"
|139 ↗||(On Diff #170962)|
createBitMaskForGaps or createBinaryMaskForGaps ?
|425 ↗||(On Diff #170962)|
Check if !EnabledMaskedInterleave before calling invalidateGroups...()?
|520 ↗||(On Diff #170962)|
Could first "peel" to build a mask for all members, then replicate it VF-1 times. Not sure it's any better.
|1960 ↗||(On Diff #170962)|
More logical to reverse the condition? Admittedly this is only being moved here.
Also added a test with stride 3.
Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.
|425 ↗||(On Diff #171493)|
The "if there is no other means..." is now part of the "This can happen when". I.e.,
|850 ↗||(On Diff #171493)|
This scaling works just as well for gap-masked loads, right?
|921 ↗||(On Diff #171493)|
Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks.
|959 ↗||(On Diff #171493)|
"or masking ..." >> "and cannot be masked (not enabled)."
|4649 ↗||(On Diff #170962)|
It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch.
|1243 ↗||(On Diff #171493)|
"Under optsize" and when the trip count is very small "we don't ..."
|2077 ↗||(On Diff #171493)|
Rename IsMaskRequired to [Is]MaskForCondRequired?
|2084 ↗||(On Diff #171493)|
Rename ShuffledMask to MaskForCond?
|4393 ↗||(On Diff #171493)|
"costModel" >> "cost model", or "CostModel"
|89 ↗||(On Diff #170962)|
Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above?