This is an archive of the discontinued LLVM Phabricator instance.

[X86] combineMulToPMADDWD - handle any pow2 vector type and split to legal types
ClosedPublic

Authored by RKSimon on Oct 2 2021, 7:50 AM.

Details

Summary

combineMulToPMADDWD is currently limited to legal types, but there's no reason why we can't handle any larger type that the existing SplitOpsAndApply code can use to split to legal X86ISD::VPMADDWD ops.

This also exposed a missed opportunity for pre-SSE41 targets to handle SEXT ops from types smaller than vXi16 - without PMOVSX instructions these will always be expanded to unpack+shifts, so we can cheat and convert this into a ZEXT(SEXT()) sequence to make it a valid PMADDWD op.

Diff Detail

Event Timeline

RKSimon created this revision.Oct 2 2021, 7:50 AM
RKSimon requested review of this revision.Oct 2 2021, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2021, 7:50 AM
RKSimon edited the summary of this revision. (Show Details)Oct 2 2021, 7:51 AM
pengfei added inline comments.Oct 2 2021, 8:21 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
44343

Nit: Should this be mul 2 directly? Then we just need to check 32 <= NumElts.
The 16 in comparison looks more like a magic number.

RKSimon marked an inline comment as done.Oct 4 2021, 7:33 AM
RKSimon planned changes to this revision.Oct 13 2021, 1:42 AM

I haven't had much time to look at the regressions yet

RKSimon updated this revision to Diff 384542.Nov 3 2021, 11:40 AM
RKSimon edited the summary of this revision. (Show Details)

Address pre-SSE41 regressions - fallback to pmullw/pmulhw for sign-extended vectors with sources that are already greater than 128-bits maximum legal width.

pengfei added inline comments.Nov 3 2021, 7:48 PM
llvm/test/CodeGen/X86/madd.ll
588–590 ↗(On Diff #384542)

Is this another regression?

RKSimon added inline comments.Nov 4 2021, 4:35 AM
llvm/test/CodeGen/X86/madd.ll
588–590 ↗(On Diff #384542)

Yeah - we're still missing some SimplifyDemanded* handling for X86ISD::VPMADDWD - looking at this now

RKSimon updated this revision to Diff 384720.Nov 4 2021, 5:38 AM

Fix pre-SSE41 regressions - if both operands are sext from i8 (or smaller) pmull/pmulh are still better - if they are a sext/zext mix we are better off using pmaddwd.

Any more comments? This ideally needs to land before I can address the regression in D113371

pengfei added inline comments.Nov 8 2021, 1:18 AM
llvm/test/CodeGen/X86/shrink_vmul.ll
992–997

The shuffle looks odd. What's its equivalent in the left?

RKSimon added inline comments.Nov 8 2021, 2:24 AM
llvm/test/CodeGen/X86/shrink_vmul.ll
992–997

The (i32 sext(i8))) operand is now (i32 zext(i16 sext(i8))), which has allowed SimplifyDemandedVectorElts to fold the (i32 zext(i8)) operand to (i32 aext(i16 zext(i8))).

The aext was lowered to unpacklwd == shuffle(0,u,1,u,2,u,3,u) but as we only require the bottom 64-bits of the vector we can further simplify that to the pshuflw == shuffle(0,u,1,u,u,u,u,u)

I don't fully understand this patch without spending much time. But I don't have other comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
44521

< ?

llvm/test/CodeGen/X86/shrink_vmul.ll
992–997

Thank you. Just learnt we have aext. :)

RKSimon updated this revision to Diff 385458.Nov 8 2021, 4:57 AM

Fix bitwidth check

RKSimon marked an inline comment as done.Nov 8 2021, 4:57 AM

final ping?

Test changes seem good, but i'm not sure about the code.

pengfei accepted this revision.Nov 9 2021, 6:27 AM

I don't think there're anything blocking it :)

This revision is now accepted and ready to land.Nov 9 2021, 6:27 AM
This revision was landed with ongoing or failed builds.Nov 9 2021, 7:21 AM
This revision was automatically updated to reflect the committed changes.