Page MenuHomePhabricator

[X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros

Authored by RKSimon on Jan 18 2018, 11:44 AM.



As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.

Diff Detail


Event Timeline

RKSimon created this revision.Jan 18 2018, 11:44 AM
craig.topper added inline comments.Jan 18 2018, 7:47 PM
32606 ↗(On Diff #130468)

Move the APInt inside the second if? Maybe combine the two ifs?

32607 ↗(On Diff #130468)

Do you need an SSE2 check on the v4i32?

RKSimon updated this revision to Diff 130585.Jan 19 2018, 4:19 AM

Merged outer two ifs, added SSE2 check.

We also now have KNL slow-pmulld.ll tests.

This revision is now accepted and ready to land.Jan 24 2018, 9:55 AM
zvi accepted this revision.Jan 24 2018, 10:05 AM

LGTM with a minor request:

We added a special feature for Silvermont, SlowPMULLD. Can you please add a RUN: config to the tests with SlowPMULLD + SSE4.2 to represent this processor?

This revision was automatically updated to reflect the committed changes.

Does this make the equivalent code in LowerMUL dead?