This patch builds on http://reviews.llvm.org/D5598 to improve byte shift/rotation shuffles on pre-SSSE3 (palignr) targets.
I've also made use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr with a zero register.
The rest of this comment is now misleading.
I think it should now say something like "It does not check for the profitability of lowering either as PALIGNR or PSRLDQ/PSLLDQ/POR, only whether the mask is valid to lower in that form."