This patch builds on http://reviews.llvm.org/D5598 to improve byte shift/rotation shuffles on pre-SSSE3 (palignr) targets.
I've also made use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr with a zero register.
The formatting here isn't right. Can you use clang-format? The curly at the least should be on the prior line.