Before this patch, WriteIMulH reported a latency value which is correct for
the RR variant of MULX, but not for the RM variant.
This patch fixes the issue by introducing a new WriteIMulHLd, which is meant
to be used only by the RM variant of MULX.
please can you move this back up - I'm trying to reduce the diffs between this + haswell at the moment