Page MenuHomePhabricator

[X86] combineMulToPMADDWD - handle any pow2 vector type and split to legal types

Authored by RKSimon on Oct 2 2021, 7:50 AM.



combineMulToPMADDWD is currently limited to legal types, but there's no reason why we can't handle any larger type that the existing SplitOpsAndApply code can use to split to legal X86ISD::VPMADDWD ops.

This also exposed a missed opportunity for pre-SSE41 targets to handle SEXT ops from types smaller than vXi16 - without PMOVSX instructions these will always be expanded to unpack+shifts, so we can cheat and convert this into a ZEXT(SEXT()) sequence to make it a valid PMADDWD op.

Diff Detail

Event Timeline

RKSimon created this revision.Oct 2 2021, 7:50 AM
RKSimon requested review of this revision.Oct 2 2021, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2021, 7:50 AM
RKSimon edited the summary of this revision. (Show Details)Oct 2 2021, 7:51 AM
pengfei added inline comments.Oct 2 2021, 8:21 AM

Nit: Should this be mul 2 directly? Then we just need to check 32 <= NumElts.
The 16 in comparison looks more like a magic number.

RKSimon marked an inline comment as done.Oct 4 2021, 7:33 AM
RKSimon planned changes to this revision.Oct 13 2021, 1:42 AM

I haven't had much time to look at the regressions yet

RKSimon updated this revision to Diff 384542.Wed, Nov 3, 11:40 AM
RKSimon edited the summary of this revision. (Show Details)

Address pre-SSE41 regressions - fallback to pmullw/pmulhw for sign-extended vectors with sources that are already greater than 128-bits maximum legal width.

pengfei added inline comments.Wed, Nov 3, 7:48 PM
588–590 ↗(On Diff #384542)

Is this another regression?

RKSimon added inline comments.Thu, Nov 4, 4:35 AM
588–590 ↗(On Diff #384542)

Yeah - we're still missing some SimplifyDemanded* handling for X86ISD::VPMADDWD - looking at this now

RKSimon updated this revision to Diff 384720.Thu, Nov 4, 5:38 AM

Fix pre-SSE41 regressions - if both operands are sext from i8 (or smaller) pmull/pmulh are still better - if they are a sext/zext mix we are better off using pmaddwd.

Any more comments? This ideally needs to land before I can address the regression in D113371

pengfei added inline comments.Mon, Nov 8, 1:18 AM

The shuffle looks odd. What's its equivalent in the left?

RKSimon added inline comments.Mon, Nov 8, 2:24 AM

The (i32 sext(i8))) operand is now (i32 zext(i16 sext(i8))), which has allowed SimplifyDemandedVectorElts to fold the (i32 zext(i8)) operand to (i32 aext(i16 zext(i8))).

The aext was lowered to unpacklwd == shuffle(0,u,1,u,2,u,3,u) but as we only require the bottom 64-bits of the vector we can further simplify that to the pshuflw == shuffle(0,u,1,u,u,u,u,u)

I don't fully understand this patch without spending much time. But I don't have other comments.


< ?


Thank you. Just learnt we have aext. :)

RKSimon updated this revision to Diff 385458.Mon, Nov 8, 4:57 AM

Fix bitwidth check

RKSimon marked an inline comment as done.Mon, Nov 8, 4:57 AM

final ping?

Test changes seem good, but i'm not sure about the code.

pengfei accepted this revision.Tue, Nov 9, 6:27 AM

I don't think there're anything blocking it :)

This revision is now accepted and ready to land.Tue, Nov 9, 6:27 AM
This revision was landed with ongoing or failed builds.Tue, Nov 9, 7:21 AM
This revision was automatically updated to reflect the committed changes.