This is an archive of the discontinued LLVM Phabricator instance.

[X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq.
ClosedPublic

Authored by craig.topper on Jul 23 2022, 4:21 PM.

Details

Summary

With SSE4.1 and above we were using 3 multiply instructions. This
was due to type legalization widening to v4i32 and the low half
being done with pmulld while the high half used two pmuldq/pmuludq.

Instead of that, we can use a single pmuludq/pmuldq to calculate
the full product at once, extract the high and low bits and compare
to check for overflow.

I've restricted SMULO to sse4.1 to get pmuldq. We can probably
do a fixup to pmuludq on earlier targets, but that's for another day.

I was going through my git stash and found an early version of this patch
from a year or two ago so I went ahead and finished it.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 23 2022, 4:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2022, 4:21 PM
craig.topper requested review of this revision.Jul 23 2022, 4:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2022, 4:21 PM
RKSimon accepted this revision.Jul 25 2022, 4:06 AM

LGTM

This revision is now accepted and ready to land.Jul 25 2022, 4:06 AM
This revision was landed with ongoing or failed builds.Jul 25 2022, 9:12 AM
This revision was automatically updated to reflect the committed changes.