vphadd is not a supported instruction for AVX-512, but if result of add (512 bits)
is partially consumed such that less than half of the bits are used by the user of an add
instruction, in that case we can perform horizontal addition and concatenate the result
with undef.
This will fix PR33758
Use of MMX here is weird. We explicitly don't generate any optimized code for MMX. So making references to it in terms of SSE/AVX is misleading.