This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improve SMULO/UMULO codegen for vXi8 vectors.
ClosedPublic

Authored by craig.topper on Feb 27 2021, 10:02 PM.

Details

Summary

The default expansion creates a MUL and either a MULHS/MULHU. Each
of those separately expand to sequences that use one or more
PMULLW instructions as well as additional instructions to
extend the types to vXi16. The MULHS/MULHU expansion computes the
whole 16-bit product, but only keeps the high part.

We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU
expansion, but keep both the high and low parts. And we can use
those parts to calculate the overflow.

For AVX512 we might have vXi1 overflow outputs. We can improve those by using vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better than truncating the high result to use vpcmpeqb. If we don't have avx512bw we can extend up to v16i32 to use vpcmpeqd to produce a k register.

Diff Detail

Event Timeline

craig.topper created this revision.Feb 27 2021, 10:02 PM
craig.topper requested review of this revision.Feb 27 2021, 10:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2021, 10:02 PM
RKSimon added inline comments.Feb 28 2021, 10:23 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
27406

Isn't this really ISD::SMUL_LOHI/UMUL_LOHI ? Could the legalizer be taught to try and use as well?

craig.topper added inline comments.Feb 28 2021, 10:51 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
27406

The legalizer will try to use SMUL_LOHI/UMUL_LOHI, but I think it tries MULHS/MULHU+MUL first. I suppose scalar was the main consideration there where targets don't usually have both so the order of checking them doesn't matter.

Maybe we could implement SMUL_LOHI/UMUL_LOHI for v16i8 and not MULHS/MULHU. Then fix LegalizeVectorOps to expand MULHS/MULHU to SMUL_LOHI/UMUL_LOHI? It doesn't now and would end up scalarizing if it sees a MULHS/MULHU.

Rebase on top of D98587. Merge in the ideas from D97650.
Only use the helper function for the UNPCK lowering. Incorporating the AVX512 mask improvements made the interface weird.

craig.topper edited the summary of this revision. (Show Details)Mar 14 2021, 4:03 PM
RKSimon added inline comments.Mar 29 2021, 3:40 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
936

(style) keep columns aligned

1339

(style) keep columns aligned

Rebase and fix spacing.

Harbormaster completed remote builds in B96365: Diff 334217.
RKSimon accepted this revision.Mar 31 2021, 4:14 AM

LGTM, I still think in the long run we should be trying to make this ISD::SMUL_LOHI/UMUL_LOHI, but if the legalizers aren't up to the job yet this is a good first step.

llvm/lib/Target/X86/X86ISelLowering.cpp
30199

Use the same indentation as the rest.

This revision is now accepted and ready to land.Mar 31 2021, 4:14 AM
This revision was landed with ongoing or failed builds.Mar 31 2021, 10:14 AM
This revision was automatically updated to reflect the committed changes.