Page MenuHomePhabricator

[TTI] Use shuffle cost in getInterleavedMemoryOpCost, if profitable.
Needs ReviewPublic

Authored by fhahn on Dec 11 2020, 8:29 AM.



Currently BasicTTIImpl::getInterleavedMemoryOpCost computes the cost of
the shuffles to extract/pack sub-vectors for interleave groups as the
cost of inserting/extracting all elements individually. This
over-estimates the cost in cases where the cost of the actual shuffle is

This patch also computes the cost for the shuffle using getShuffleCost
and picks the minimum. The extract/insert combination can be cheaper
than shuffles, e.g. if there are many gaps in the group.

Diff Detail

Event Timeline

fhahn requested review of this revision.Dec 11 2020, 8:29 AM
fhahn created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2020, 8:29 AM
RKSimon added inline comments.Dec 12 2020, 8:35 AM

Wouldn't trying getScalarizationOverhead be a better approach?

RKSimon added inline comments.Sat, Jan 2, 8:07 AM


fhahn updated this revision to Diff 314400.Mon, Jan 4, 9:51 AM

Rename SSE-ATOM check lines to ATOM

fhahn added inline comments.Mon, Jan 4, 9:55 AM

I think we could use getScalarizationOverhead for the ExtractElement/InsertElement variants, but I am not sure if that would be more compact. AFAICT getScalarizationOverhead currently only considers the case when each element is either available in a scalar (insertion) or is used as a scalar (extraction).

The new case here is slightly different, because we break down a larger vector into a smaller one or vice-versa.