As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
LGTM with a minor request:
We added a special feature for Silvermont, SlowPMULLD. Can you please add a RUN: config to the tests with SlowPMULLD + SSE4.2 to represent this processor?
Move the APInt inside the second if? Maybe combine the two ifs?