This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.
Mask decoding/target shuffle support will be added in future patches - this patch has to do some initial cleanup as the internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types - this matches the clang builtin and the AMD intrinsics definitions. As its just the llvm intrinsic ir I don't think I need to provide an upgrade path but can if anybody thinks it necessary.