Not entirely sure if this is generaly good given X86's poor variable
shuffle support in earlier SSE versions.
These are very undertested intrinsics. I can improve that, but wanted
to get feedback if this generally made sense or if we needed a way
to limit to cases that have good variable shift support.
clang-format not found in user’s local PATH; not linting file.