It turns out that SchedWrite WriteIMulH was always assigned to the low half of the result of a MULX (rather than to the high half).
To avoid confusion, this patch swaps the two MULX writes in the tablegen definition of MULX32/64.
That way, write names better describe what they actually refer to; this also avoids further complications, if in future we decide to reuse the same MulH writes to also model other scalar integer multiply instructions.
I also had to swap the latency values for the two MULX writes to make sure that the change is effectively an NFC. In fact, none of the existing x86 tests were affected by this small refactoring.
This patch also fixes a bug in MCA: a wrong latency value was propagated for instructions that perform multiple writes to a same register.
This last issue was found by Roman while testing MULX on targets that define a different latency for the Low/High part of the result.
This doesn't compile for me