As far as I can tell, gcc passes 256/512 bit vectors __int128 in memory. And passes a vector of 1 _int128 in an xmm register. The backend considers <X x i128> as an illegal type and will scalarize any arguments with that type. So we need to coerce the argument types in the frontend to match to avoid the illegal type.
Are there other element types to consider? Do we need to keep the old behavior on platforms where clang is the de facto compiler?
This issue was identified in PR42607. Though even with the types changed, we still seem to be doing some unnecessary stack realignment.
I'll add test cases later today or over the weekend.