AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.
I added testing for 32-bit mode. I also removed the "bw" test from 8x64 vectors since it does not make any sense. (bw is for i8 and i16 types)
Is the constant build vector splitting worth putting into a helper function instead of bulking out this function? I haven't actually checked if there is similar code anywhere else (possibly vector variable shifts?)