We currently have no way to emit the xxspltw for a word splat and a call to vec_splat for vectors with 8-byte elements get translated to vperm with a mask vector that comes from memory.
This patch provides intrinsics to get at xxspltw and xxspltd (extended mnemonic). In a subsequent patch, altivec.h will be modified to use these intrinsics for the respective vec_splat definitions.
This provides a significant improvement in one benchmark that uses vec_splat.
Are you going to define a builtin and emit this intrinsics in the clang, when that builtin is seen? Can't we use shufflevector instruction, instead of target specific intrinsic?