Page MenuHomePhabricator

[CostModel][X86] Teach getArithmeticReductionCost to properly cost the shuffles needed for a PairWise reduction before we reach a legal type. And remove one shuffle from the end of the reduction.

Authored by craig.topper on Dec 7 2018, 12:35 PM.



The loop for handling illegal types queried for extract_subvector shuffle which I think will always return 0 for an illegal type on X86. But that's not the correct shuffle. We really need 2 two source permutes to extract even and odd elements. For non-pairwise we don't need any shuffle so I've removed it entirely.

We were also counting 2 shuffles on the very last reduction step when we need to add element 1 to element 0. But element 0 is already in place. So we only need to move element 1.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 7 2018, 12:35 PM

Use ShufLT.second.getVectorNumElements() to create the ShufTy.

I don't see a single test, where you could run into the described problem. And I think this patch should be split into 2 parts: 1 for the first part with the types and another one for extra cost.
Also, I think that the problem with the incorrect cost for X86 is the problem of the X86 TTI implementation rather than the problem of the generic solution.

@craig.topper Do we still need this?

craig.topper abandoned this revision.Thu, Mar 26, 10:28 AM