Page MenuHomePhabricator

[X86][Costmodel] `getReplicationShuffleCost()`: promote 16 bit-wide elements to 32 bit when no AVX512BW
ClosedPublic

Authored by lebedev.ri on Wed, Nov 10, 1:16 PM.

Details

Summary

The basic idea is simple, if we don't have native shuffle for this element type,
then we must have native shuffle for wider element type,
so promote, replicate, demote.

I believe, asking getCastInstrCost(Instruction::Trunc is correct semantically,
case in point trunc <32 x i32> to <32 x i8> aka 2 * ZMM will naively result in
2 * XMM, that then will be packed into 1 * YMM,
and it should count the cost of said packing,
not just the truncations.

Diff Detail

Event Timeline

lebedev.ri created this revision.Wed, Nov 10, 1:16 PM
lebedev.ri requested review of this revision.Wed, Nov 10, 1:16 PM

(i cross-checked with my original sanity check from https://reviews.llvm.org/D113350?id=385758, and it is happy with the logic.)

lebedev.ri added a reviewer: spatel.

NFC, use Eff prefix for potentially-promoted types, not Shuf.

And make diff nicer by using recursion.

lebedev.ri edited the summary of this revision. (Show Details)Wed, Nov 10, 3:08 PM

Assuming the direction makes sense, all the remaining changes after this are trivial (specifying the working elt ty to promote to).

Can you give more examples of the bad cast costs? trunc <32 x i32> to <32 x i8> -> https://wide.godbolt.org/z/Y3rznPEnY seems to be OK

lebedev.ri added inline comments.Fri, Nov 12, 10:05 AM
llvm/test/Analysis/CostModel/X86/shuffle-replication-i16.ll
170–173

Worse than scalarization cost

351–355

Worse than scalarization cost

442–446

Worse than scalarization cost

533–537

Worse than scalarization cost

Can you give more examples of the bad cast costs? trunc <32 x i32> to <32 x i8> -> https://wide.godbolt.org/z/Y3rznPEnY seems to be OK

I wasn't actually saying that trunc <32 x i32> to <32 x i8> is bad, i was only describing that the cost should include not only the extensions/truncation, but vector extractions/insertions.

Can you give more examples of the bad cast costs? trunc <32 x i32> to <32 x i8> -> https://wide.godbolt.org/z/Y3rznPEnY seems to be OK

Do you want me to produce standalone snippets (just sext/trunc) from these examples, and update llvm/test/Analysis/CostModel/X86/extend.ll test coverage?
Do we consider these to be blockers? Note that in reality X86TTIImpl::getReplicationShuffleCost() will mostly only be asked about i1 elt type.

Can you give more examples of the bad cast costs? trunc <32 x i32> to <32 x i8> -> https://wide.godbolt.org/z/Y3rznPEnY seems to be OK

Do you want me to produce standalone snippets (just sext/trunc) from these examples, and update llvm/test/Analysis/CostModel/X86/extend.ll test coverage?
Do we consider these to be blockers? Note that in reality X86TTIImpl::getReplicationShuffleCost() will mostly only be asked about i1 elt type.

Fixed in D113842.

Rebased after D113842 - now all costs (at least the ones tested here :)) are improvements.
I believe this is now trivial.

lebedev.ri edited the summary of this revision. (Show Details)Sun, Nov 14, 7:46 AM
RKSimon added inline comments.Sun, Nov 14, 7:47 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3666–3668

Could we avoid so much multi-line splill is we just renamed 'Replicated' to 'Dst'?

RKSimon added inline comments.Sun, Nov 14, 7:49 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3645

I assume 'Eff' is short for effective? But we talk about it in terms of promotion - can you use a consistent term, I don't mind which.

lebedev.ri marked an inline comment as done.

s/Replicated/Dst/

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3666–3668

Looks a bit more compact, yes.

lebedev.ri marked an inline comment as done.

s/Eff/Prom/

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3645

Hmm, it does not seem any different to me, but okay.

RKSimon accepted this revision.Sun, Nov 14, 8:49 AM

LGTM - but please do the Replicated -> Dst rename as NFC pre-commit

This revision is now accepted and ready to land.Sun, Nov 14, 8:49 AM

LGTM - but please do the Replicated -> Dst rename as NFC pre-commit

Will do, thank you for the review!