This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer reduced width multiplication over pmulld on Silvermont
ClosedPublic

Authored by zvi on Nov 29 2016, 4:58 AM.

Details

Summary

Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].

Fixes pr31202.

Analysis of this issue was done by Fahana Aleen.

Diff Detail

Repository
rL LLVM

Event Timeline

zvi updated this revision to Diff 79519.Nov 29 2016, 4:58 AM
zvi retitled this revision from to [X86] Prefer reduced width multiplication over pmulld on Silvermont.
zvi updated this object.
zvi added reviewers: mkuper, delena, wmi.
zvi set the repository for this revision to rL LLVM.
zvi added a subscriber: llvm-commits.
mkuper added inline comments.Nov 30 2016, 10:24 AM
lib/Target/X86/X86Subtarget.cpp
232 ↗(On Diff #79519)

(!isPMULLDSlow() || hasSSE41()) is good enough, logically speaking.

test/CodeGen/X86/slow-pmulld.ll
1 ↗(On Diff #79519)

Please add a check that we *do* generate a pmulld for non-slow targets.
(Or, if we already have such a test, merge this one into that).

zvi added inline comments.Dec 4 2016, 12:42 AM
lib/Target/X86/X86Subtarget.cpp
232 ↗(On Diff #79519)

Good catch

test/CodeGen/X86/slow-pmulld.ll
1 ↗(On Diff #79519)

Ok, I will add tests for SSE4.1 targets w/o the slowpmulld feature

zvi updated this revision to Diff 80199.Dec 4 2016, 12:44 AM

Fixes for Michael's comments.

delena accepted this revision.Dec 6 2016, 1:08 AM
delena edited edge metadata.
This revision is now accepted and ready to land.Dec 6 2016, 1:08 AM
zvi added a comment.Dec 6 2016, 1:16 AM

@mkuper Anything to add?

mkuper accepted this revision.Dec 6 2016, 10:42 AM
mkuper edited edge metadata.

Sorry, missed the update notification. LGTM.

This revision was automatically updated to reflect the committed changes.