The existing code only performs the transformation for imulq on 64-bit platforms. It makes as much sense to make it for imull, on both 32-bit and 64-bit.
It looks like there's some additional clean-up needed, as well as additional patterns, but that'll go into a separate review.
Also extended the test to cover more cases.