AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.
I added testing for 32-bit mode. I also removed the "bw" test from 8x64 vectors since it does not make any sense. (bw is for i8 and i16 types)