Page MenuHomePhabricator

[TTI] Add DemandedElts to getScalarizationOverhead
ClosedPublic

Authored by RKSimon on Apr 15 2020, 9:01 AM.

Details

Summary

The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.

This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.

This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.

A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.

Diff Detail

Event Timeline

RKSimon created this revision.Apr 15 2020, 9:01 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper added inline comments.Apr 15 2020, 12:42 PM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
2888

Doesn't this over count the cost of inserting multiple elements into the upper half of a 256 bit vector? We'll cost a subvector insert for each element, but we should be able to share.

RKSimon updated this revision to Diff 258304.Apr 17 2020, 6:11 AM

Only cost for a subvector insertion once

craig.topper added inline comments.Apr 17 2020, 10:59 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
2894

What about the 0 cost for fp insertion into element 0 of each subvector?

RKSimon marked an inline comment as done.Apr 17 2020, 1:06 PM
RKSimon added inline comments.
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
2894

Nice catch - I'll update the patch tomorrow

RKSimon updated this revision to Diff 258603.Apr 19 2020, 7:48 AM

Improve vXf32 scalarization costs by accounting for the insertion into the 0'th index element of each 128-bit vector is free.

X86 changes look good to me. I don't know much about the SLPVectorizer though.

craig.topper added inline comments.Apr 27 2020, 2:06 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3885

Can we annotate the bools with /**/ comments in all the call sites to make it readable?

craig.topper added inline comments.Apr 27 2020, 2:14 PM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
2922

Should this be a marked a proper FIXME/TODO?

vdmitrie added inline comments.Apr 27 2020, 3:12 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7024

This sounds like a simple machinery task while it is not. I was trying to explain the underlying issue in D76956. The problem is that we have a bias in cost model toward favoring vectorization for longer vectors and if we fix just this place the bias will be much more severe. Since this patch effectively replaces 76956 I'm perfectly fine in a case if you decide to borrow/adopt my comment.

RKSimon updated this revision to Diff 260596.Apr 28 2020, 5:01 AM

Add argument comments to indicate Insert and Extract use.

Add explanation comments for SLP's findBuildAggregate from D76956

This revision is now accepted and ready to land.Apr 28 2020, 10:47 AM
This revision was automatically updated to reflect the committed changes.