This is an archive of the discontinued LLVM Phabricator instance.

[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads
ClosedPublic

Authored by dorit on Oct 24 2018, 1:29 PM.

Details

Summary

Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects).

Diff Detail

Event Timeline

dorit created this revision.Oct 24 2018, 1:29 PM
Ayal added a comment.Oct 29 2018, 3:11 AM

Looking at tests next.

include/llvm/Analysis/TargetTransformInfo.h
833

Maybe rename IsMasked into UseMaskForCond or IsConditional, to distinguish between the two Is/Use Masks.

include/llvm/Analysis/VectorUtils.h
129

"masks away gaps" >> "filters the members"

139

createBitMaskForGaps or createBinaryMaskForGaps ?

425

Check if !EnabledMaskedInterleave before calling invalidateGroups...()?

lib/Analysis/VectorUtils.cpp
520

Could first "peel" to build a mask for all members, then replicate it VF-1 times. Not sure it's any better.

lib/Transforms/Vectorize/LoopVectorize.cpp
1960

More logical to reverse the condition? Admittedly this is only being moved here.

dorit updated this revision to Diff 171493.Oct 29 2018, 7:14 AM

Addressed comments.
Also added a test with stride 3.

dorit marked 5 inline comments as done.Oct 29 2018, 7:17 AM

Comments addressed. Thanks!

lib/Analysis/VectorUtils.cpp
520

agreed. left as is.

Ayal accepted this revision.Oct 30 2018, 1:57 AM

Also added a test with stride 3.

Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.

include/llvm/Analysis/VectorUtils.h
423

The "if there is no other means..." is now part of the "This can happen when". I.e.,
"This can happen when optimizing for size forbids a scalar epilogue, and the gap cannot be filtered by masking the load/store."

include/llvm/CodeGen/BasicTTIImpl.h
831

This scaling works just as well for gap-masked loads, right?
(Another place to remember extending when introducing support for gap-masked stores, btw)

902

Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks.

lib/Analysis/VectorUtils.cpp
960–961

"or masking ..." >> "and cannot be masked (not enabled)."

lib/Transforms/Vectorize/LoopVectorize.cpp
1243

"Under optsize" and when the trip count is very small "we don't ..."

2077–2093

Rename IsMaskRequired to [Is]MaskForCondRequired?

2084

Rename ShuffledMask to MaskForCond?

4393

"costModel" >> "cost model", or "CostModel"

4649

It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch.

test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
89

Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above?

This revision is now accepted and ready to land.Oct 30 2018, 1:57 AM
This revision was automatically updated to reflect the committed changes.
dorit marked 6 inline comments as done.