This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Fix `X86TTIImpl::getGSScalarCost()`
ClosedPublic

Authored by lebedev.ri on Oct 6 2021, 5:55 AM.

Details

Summary

X86TTIImpl::getGSScalarCost() has (at least) two issues:

  • it naively computes the cost of sequence of insertelement/extractelement. If we are operating not on the XMM (but YMM/ZMM), this widely overestimates the cost of subvector insertions/extractions.
  • Gather/scatter takes a vector of pointers, and scalarization results in us performing scalar memory operation for each of these pointers, but we never account for the cost of extracting these pointers out of the vector of pointers.

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 6 2021, 5:55 AM
lebedev.ri requested review of this revision.Oct 6 2021, 5:55 AM
lebedev.ri edited the summary of this revision. (Show Details)
lebedev.ri updated this revision to Diff 377655.Oct 6 2021, 1:12 PM

Rebased, NFC.

Note that even with the proper LV fix (D111460), we still want this fix too.

This revision is now accepted and ready to land.Oct 13 2021, 12:21 PM

LGTM

Thank you for the review!
I'm going to land this, all these patches probably conflict with eachother..

This revision was automatically updated to reflect the committed changes.

Do you think we should add gather/scatter test coverage for x86 triples with 32-bit pointers (i686 or gnux32 ?)

Do you think we should add gather/scatter test coverage for x86 triples with 32-bit pointers (i686 or gnux32 ?)

Does anyone actually care about 32-bit performance, especially vectorized?
We could extract gather/scatter coverage out of llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll into a new file and either duplicate it or just duplicate runlines i suppose.