There are some cases where the mul sequence is smaller, but for the most part, using a div is preferable.
This does not apply to vectors, since x86 doesn't have vector idiv, and a vector mul/shifts sequence ought to be smaller than a scalarized division.
(Of course, this really depends on the type, since we may not have vector muls/shifts either, but I'd rather just keep the existing behavior for the vector case)