If the high part of the load is not used the offset to the next element will not be set correctly.
For example, on Sparc V8, the following code will read val2 from offset 4 instead of 8.
int val = __builtin_va_arg(va, long long); int val2 = __builtin_va_arg(va, int);