Page MenuHomePhabricator

[TTI] Use shuffle cost in getInterleavedMemoryOpCost, if profitable.
Needs ReviewPublic

Authored by fhahn on Dec 11 2020, 8:29 AM.

Details

Summary

Currently BasicTTIImpl::getInterleavedMemoryOpCost computes the cost of
the shuffles to extract/pack sub-vectors for interleave groups as the
cost of inserting/extracting all elements individually. This
over-estimates the cost in cases where the cost of the actual shuffle is
lower.

This patch also computes the cost for the shuffle using getShuffleCost
and picks the minimum. The extract/insert combination can be cheaper
than shuffles, e.g. if there are many gaps in the group.

Diff Detail

Event Timeline

fhahn requested review of this revision.Dec 11 2020, 8:29 AM
fhahn created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2020, 8:29 AM
RKSimon added inline comments.Dec 12 2020, 8:35 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
1171

Wouldn't trying getScalarizationOverhead be a better approach?

RKSimon added inline comments.Sat, Jan 2, 8:07 AM
llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
6

--check-prefix=ATOM

fhahn updated this revision to Diff 314400.Mon, Jan 4, 9:51 AM

Rename SSE-ATOM check lines to ATOM

fhahn added inline comments.Mon, Jan 4, 9:55 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
1171

I think we could use getScalarizationOverhead for the ExtractElement/InsertElement variants, but I am not sure if that would be more compact. AFAICT getScalarizationOverhead currently only considers the case when each element is either available in a scalar (insertion) or is used as a scalar (extraction).

The new case here is slightly different, because we break down a larger vector into a smaller one or vice-versa.