According to i386 System V ABI:
- when __m256 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 32 byte boundary at the time of the call.
- when __m512 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 64 byte boundary at the time of the call.
The current method of clang passing __m512 parameter is as follow:
- when target supports avx512, passing it with 64 byte alignment;
- when target supports avx, passing it with 32 byte alignment;
- Otherwise, passing it with 16 byte alignment.
Passing __m256 parameter is as follow:
- when target supports avx or avx512, passing it with 32 byte alignment;
- Otherwise, passing it with 16 byte alignment.
This pach will passing m128/m256/__m512 following i386 System V ABI and
apply it to Linux only since other System V OS (e.g Darwin, PS4 and FreeBSD) don't
want to spend any effort dealing with the ramifications of ABI breaks at present.
Please don't use default arguments here; it's isn't helping readability.