This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...
ClosedPublic

Authored by yubing on Oct 19 2020, 11:32 PM.

Details

Summary

In each 128-lane, if there is at least one index is demanded and not all
indices are demanded and this 128-lane is not the first 128-lane of the
legalized-vector, then this 128-lane needs a extracti128;
If in each 128-lane, there is at least one index is demanded, this 128-lane
needs a inserti128.

The following cases will help you build a better understanding:
Assume we insert several elements into a v8i32 vector in avx2,
Case#1: inserting into 1th index needs vpinsrd + inserti128
Case#2: inserting into 5th index needs extracti128 + vpinsrd +
inserti128
Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.

Diff Detail

Event Timeline

yubing created this revision.Oct 19 2020, 11:32 PM
yubing requested review of this revision.Oct 19 2020, 11:32 PM
yubing retitled this revision from [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly: In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane... to [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not allindices are demanded and this 128-lane is not the first 128-lane....Oct 19 2020, 11:33 PM
yubing edited the summary of this revision. (Show Details)
yubing retitled this revision from [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not allindices are demanded and this 128-lane is not the first 128-lane... to [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....Oct 19 2020, 11:45 PM
yubing edited the summary of this revision. (Show Details)
yubing updated this revision to Diff 299275.Oct 20 2020, 12:04 AM

Fix the warning raised by Lint

pengfei added inline comments.Oct 20 2020, 7:42 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3088

Why inserting 5th in case 2 needs extracti128 but case 3 doesn't?

3089

Do we need to skip if LT.second.getSizeInBits() == 128?

yubing added inline comments.Oct 20 2020, 7:51 PM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3088

in case#3, we don't need to retain other elements since all the indices are inserted, Thus, extracti128 is not needed.

3089

we've checked (LT.second.getSizeInBits() <= 128) in line3073

RKSimon added inline comments.Oct 22 2020, 2:11 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3077–3109

"if at least one index is demanded but not all indices are demanded"

3080

", there is at least one demanded index, this"

3099

Isn't WidenedDemandedElts.getBitWidth() == NumElts at this stage?

3109

I'm not sure what we gain by having calculateCostOfExtInsr128Lanes() as lambda function?

yubing updated this revision to Diff 300169.Oct 22 2020, 11:10 PM

Modify some comments and remove unnecessary lambda function

yubing marked 4 inline comments as done.Oct 22 2020, 11:12 PM
pengfei accepted this revision.Oct 25 2020, 7:42 PM

LGTM. But I suggest you waiting for one or two days to see if @RKSimon or others object.

This revision is now accepted and ready to land.Oct 25 2020, 7:42 PM
This revision was landed with ongoing or failed builds.Oct 26 2020, 8:21 PM
This revision was automatically updated to reflect the committed changes.