This patch builds on http://reviews.llvm.org/D5598 to improve byte shift/rotation shuffles on pre-SSSE3 (palignr) targets.
I've also made use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr with a zero register.