D153576 brought _mulx_u32 to 64-bit targets. But the lowering of it
doesn't satisfy does not read or write arithmetic flags described in
intrinsic guide: https://godbolt.org/z/xb1fjf1sM
This patch completes the lowering part through combining
(i32 (trunc (shr (mul (zext (i32 A)), (zext (i32 B))), 32)))
to (umul_lohi (i32 A), (i32 B))
It is also a general optimization.
Shouldn't this be SMUL_LOHI?