combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.
But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.
This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.
Can you mirror rG61225c081858efe55dfc7051b338c797fab07cff and introduce DAG.ComputeMinSignedBits()?
Then this becomes an obvious return DAG.ComputeMinSignedBits() <= 16;