Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.
I have a sibling clang patch in progress that I will push for review in due time.
I'm also intending to deal with the signed saturation equivalents as well.