This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.
The code works fine for 256 and 512-bit vectors as well (although its currently limited to in-line shuffles), but I'm seeing a number of regressions (mainly we'd prefer blend+permute in many cases) that need addressing before enabling on anything but v16i8.