...alignment, including
struct, union and vector types. For struct/union, there is no probem because it will align
to 4 bytes when passing them. For m128/m256/__m512 vector type, it will get wrong result.
This patch will get va_arg according the rules below:
- When the target doesn't support avx and avx512: get m128/m256/__m512 from 16 bytes aligned stack.
- When the target supports avx: get m256/m512 from 32 bytes aligned stack.
- When the target supports avx512: get __m512 from 64 bytes aligned stack.
Notice: The current behavior of clang is inconsistent with i386 abi. The i386-abi says as below:
- If parameters of type __m256 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 32 byte boundary at the time of the call.
- If parameters of type __m512 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 64 byte boundary at the time of the call.