This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes
ClosedPublic

Authored by RKSimon on Mar 27 2018, 6:20 AM.

Details

Summary

Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.

This introduces WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes instead.

SLM - Used Agner's values.

ZN - VPMOVMSKB/VPMOVMSKBY had some weird values that have been cleaned up based on advice from @GGanesh.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Mar 27 2018, 6:20 AM
GGanesh added inline comments.Mar 27 2018, 6:36 AM
lib/Target/X86/X86ScheduleZnver1.td
222 ↗(On Diff #139909)

The original version is wrong for the obvious reasons!
The Ymm version takes two u-ops and has two cycle latency.

courbet added inline comments.Mar 27 2018, 6:49 AM
lib/Target/X86/X86SchedBroadwell.td
300 ↗(On Diff #139909)

I'm measuring a latency of 1 for MMX_PMOVMSKBrr, removing the override would break this.

RKSimon updated this revision to Diff 139915.Mar 27 2018, 6:53 AM
RKSimon edited the summary of this revision. (Show Details)

Thanks @GGanesh

RKSimon added inline comments.Mar 27 2018, 7:09 AM
lib/Target/X86/X86SchedBroadwell.td
300 ↗(On Diff #139909)

Should I put the override back for Haswell/SandyBridge?

courbet added inline comments.Mar 27 2018, 7:14 AM
lib/Target/X86/X86SchedBroadwell.td
300 ↗(On Diff #139909)

Yes, please put it back for HW/BD. As for SNB, it does not have an override for MMX_PMOVMSKBrr right now, I'm going to add one in a separate commit.

To summarize my measurements:
(I don't have an SKX at hand, but I expect it to be similar to SKL)

SNBBDW/HSWSKL
MMX_PMOVMSKBrr113
(V)PMOVMSKBrr233
courbet added inline comments.Mar 27 2018, 7:23 AM
lib/Target/X86/X86SchedBroadwell.td
300 ↗(On Diff #139909)

I'm going to add one in a separate commit.

D44933

RKSimon updated this revision to Diff 139925.Mar 27 2018, 7:29 AM
RKSimon retitled this revision from [X86] Add WriteFMOVMSK/WriteVecMOVMSK scheduler classes to [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes.
RKSimon edited the summary of this revision. (Show Details)

Add WriteMMXMOVMSK class - avoids an extra 2 scheduler classes by merging with IIC_MMX_MOVMSK

courbet accepted this revision.Mar 27 2018, 8:00 AM

SNB/HW/BW/SKL/SKX LGTM.

This revision is now accepted and ready to land.Mar 27 2018, 8:00 AM

Thanks, @craig.topper do you have any comments about SLM before I commit?

SLM looks good. Agner's latency/throughput data agrees with the Intel Optimization manual.

This revision was automatically updated to reflect the committed changes.