This is an archive of the discontinued LLVM Phabricator instance.

[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT
ClosedPublic

Authored by RKSimon on Nov 7 2021, 2:01 PM.

Details

Summary

combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.

Diff Detail

Event Timeline

RKSimon created this revision.Nov 7 2021, 2:01 PM
RKSimon requested review of this revision.Nov 7 2021, 2:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2021, 2:01 PM
lebedev.ri added inline comments.Nov 7 2021, 2:10 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
48325–48327

Can you mirror rG61225c081858efe55dfc7051b338c797fab07cff and introduce DAG.ComputeMinSignedBits()?
Then this becomes an obvious return DAG.ComputeMinSignedBits() <= 16;

48331

return Known.countMaxActiveBits() <= 16;

RKSimon planned changes to this revision.Nov 8 2021, 1:05 AM

Still looking at the AVX2 regressions

RKSimon updated this revision to Diff 386210.Nov 10 2021, 9:53 AM
RKSimon retitled this revision from [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT to [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT (WIP).

Use ComputeMinSignedBits/countMaxActiveBits

Add SimplifyVBinOp to DAGCombiner::visitMULHS/U to fold MULH(shuffle(x,u,m),shuffle(y,u,m)) -> shuffle(MULH(x,y),u,m) - annoyingly its tricky to expose this with existing MULH generation :(

There's still a few AVX2 regressions to address.

RKSimon planned changes to this revision.Nov 10 2021, 9:53 AM
RKSimon updated this revision to Diff 390720.Nov 30 2021, 7:41 AM

Fix AVX2 regressions - we can use a similar trick to what we do for MADDWD - if we know that the upper bits of the extended input elements are zero, then the MULHU result for those (bitcasted) sub-elements will be zero as well - so we can perform the MULHU on the (bitcasted) inputs and then cheaply truncate/packus the results to vXi16.

RKSimon retitled this revision from [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT (WIP) to [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT.Nov 30 2021, 7:41 AM

Hm, i'm not sure i'm following, why is the truncation free if the dropped bits are zeros / why it won't be free otherwise?

Hm, i'm not sure i'm following, why is the truncation free if the dropped bits are zeros / why it won't be free otherwise?

Its not - IsTruncateFree is only true if we've extended both inputs, in which case truncating the inputs should be free, and we're better off doing that so we can perform the MULH on a smaller vector.

Hm, i'm not sure i'm following, why is the truncation free if the dropped bits are zeros / why it won't be free otherwise?

Its not - IsTruncateFree is only true if we've extended both inputs, in which case truncating the inputs should be free, and we're better off doing that so we can perform the MULH on a smaller vector.

Does the patch description need updating? In particular, i'm not sure what the second part of phrase this mean:

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

llvm/lib/Target/X86/X86ISelLowering.cpp
48588

What does "cheaply" mean here?

RKSimon updated this revision to Diff 390751.Nov 30 2021, 10:05 AM

Improved comments to explain the cheap packs/truncs which mean the mul->mulh are cost effective.

lebedev.ri accepted this revision.Nov 30 2021, 10:12 AM

LGTM, not sure if someone else should take a look.
Thank you.

This revision is now accepted and ready to land.Nov 30 2021, 10:12 AM
This revision was landed with ongoing or failed builds.Dec 1 2021, 8:38 AM
This revision was automatically updated to reflect the committed changes.