If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.
The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.
If people want I can try to incorporate this more into the ShrinkMode enum returned by canReduceVMulWidth ?
Why doesn't this test have any avx command lines. I assume some of the unpcks in the modified test case would be a zero extend on newer feature sets?