Page MenuHomePhabricator

[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads
ClosedPublic

Authored by dorit on Oct 24 2018, 1:29 PM.

Details

Summary

Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects).

Diff Detail

Repository
rL LLVM

Event Timeline

dorit created this revision.Oct 24 2018, 1:29 PM
Ayal added a comment.Oct 29 2018, 3:11 AM

Looking at tests next.

include/llvm/Analysis/TargetTransformInfo.h
833 ↗(On Diff #170962)

Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks.

include/llvm/Analysis/VectorUtils.h
129 ↗(On Diff #170962)

"masks away gaps" >> "filters the members"

139 ↗(On Diff #170962)

createBitMaskForGaps or createBinaryMaskForGaps ?

425 ↗(On Diff #170962)

Check if !EnabledMaskedInterleave before calling invalidateGroups...()?

lib/Analysis/VectorUtils.cpp
520 ↗(On Diff #170962)

Could first "peel" to build a mask for all members, then replicate it VF-1 times. Not sure it's any better.

lib/Transforms/Vectorize/LoopVectorize.cpp
1960 ↗(On Diff #170962)

More logical to reverse the condition? Admittedly this is only being moved here.

dorit updated this revision to Diff 171493.Oct 29 2018, 7:14 AM

Addressed comments.
Also added a test with stride 3.

dorit marked 5 inline comments as done.Oct 29 2018, 7:17 AM

Comments addressed. Thanks!

lib/Analysis/VectorUtils.cpp
520 ↗(On Diff #170962)

agreed. left as is.

Ayal accepted this revision.Oct 30 2018, 1:57 AM

Also added a test with stride 3.

Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.

include/llvm/Analysis/VectorUtils.h
425 ↗(On Diff #171493)

The "if there is no other means..." is now part of the "This can happen when". I.e.,
"This can happen when optimizing for size forbids a scalar epilogue, and the gap cannot be filtered by masking the load/store."

include/llvm/CodeGen/BasicTTIImpl.h
850 ↗(On Diff #171493)

This scaling works just as well for gap-masked loads, right?
(Another place to remember extending when introducing support for gap-masked stores, btw)

921 ↗(On Diff #171493)

Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks.

lib/Analysis/VectorUtils.cpp
959 ↗(On Diff #171493)

"or masking ..." >> "and cannot be masked (not enabled)."

lib/Transforms/Vectorize/LoopVectorize.cpp
4649 ↗(On Diff #170962)

It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch.

1243 ↗(On Diff #171493)

"Under optsize" and when the trip count is very small "we don't ..."

2077 ↗(On Diff #171493)

Rename IsMaskRequired to [Is]MaskForCondRequired?

2084 ↗(On Diff #171493)

Rename ShuffledMask to MaskForCond?

4393 ↗(On Diff #171493)

"costModel" >> "cost model", or "CostModel"

test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
89 ↗(On Diff #170962)

Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above?

This revision is now accepted and ready to land.Oct 30 2018, 1:57 AM
This revision was automatically updated to reflect the committed changes.
dorit marked 6 inline comments as done.