The (v)palignr instructions are currently described using builtin intrinsics although the x86 shuffle lowering code now correctly identifies them.
This patch replaces the builtins with generic __builtin_shufflevector calls. I'll be posting a LLVM equivalent patch shortly.