Implemented lowering for 512-bit vector shuffles.
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.
AVX-512 provides vector shuffle instructions with variable mask (mask in register) for one and two sources (VPERM and VPERMT2).
Use them instead of splitting vectors.
All new shuffle instructions are for integer and FP data types.
More optimizations in the next patch.