Arm64 Procedure Call Standard specifies than only vectors up to 16 bytes
are stored in v0 (which makes sense, as that's the size of the
register). 32-byte vector types are passed as regular structs via x8
pointer. Treat them as such.
This fixes TestReturnValue for arm64-clang. I also split the test case
into two so I can avoid the if(gcc) line, and annotate each test
instead. (It seems the vector type tests fail with gcc only when
targetting x86 arches).