The psll{,i}{w,d,q} instruction is almost a vector shl however, it has defined
behavior of evaluating to 0 for shifts greater than the bitwidth of the elements.
We can’t currently represent this directly in llvm without generating extra
code, but we can handle the constant case.
This excludes avx512 as I don't have hardware to verify. It excludes _dq
variants because they are represented in the IR as <{2,4} x i64> when it's
actually a byte shift of the entire i{128,265}.
This also excludes _dq_bs as they aren't at all supported by the backend.
There are also no corresponding instructions in the ISA. I have no idea why
they exist...