Page MenuHomePhabricator

[WIP][X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support
Changes PlannedPublic

Authored by RKSimon on Aug 9 2019, 4:13 AM.

Details

Summary

This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.

We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.

This is still WIP. There are a couple of regressions I'm still investigating, mainly when we should/shouldn't be using INSERTPS for BUILD_VECTOR patterns (PR27854).

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Aug 9 2019, 4:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2019, 4:13 AM

Can this patch solve bad codegen for 'f5'?

https://godbolt.org/z/3YpVg-

Can this patch solve bad codegen for 'f5'?

https://godbolt.org/z/3YpVg-

I don't think any amount of shuffle combining is going to recover that. I think we need to look at lowerShuffleAsLanePermuteAndRepeatedMask

Can this patch solve bad codegen for 'f5'?

https://godbolt.org/z/3YpVg-

I don't think any amount of shuffle combining is going to recover that. I think we need to look at lowerShuffleAsLanePermuteAndRepeatedMask

Nevermind, that won't fix it. The two lanes have different controls for the shufpd in gcc's code. I think we need a new strategy.

They improved f1 and f2 cases to save one instruction wrt LLVM’s codegen

https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01952.html

RKSimon planned changes to this revision.Sat, Oct 12, 8:18 AM

WIP - PR27854 and PR43024 need to be finished first.