Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.
This introduces WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes instead.
SLM - Used Agner's values.
ZN - VPMOVMSKB/VPMOVMSKBY had some weird values that have been cleaned up based on advice from @GGanesh.
I'm measuring a latency of 1 for MMX_PMOVMSKBrr, removing the override would break this.