This batch of intrinsics fills in all the shift instructions that take
a variable shift distance in a register, instead of an immediate. Some
of these instructions take a single shift distance in a scalar
register and apply it to all lanes; others take a vector of per-lane
distances.
These instructions are all basically one family, varying in whether
they saturate out-of-range values, and whether they round when bits
are shifted off the bottom. I've implemented them at the IR level by a
much smaller family of IR intrinsics, which take flag parameters to
indicate saturating and/or rounding (along with the usual one to
specify signed/unsigned integers).
An oddity is that all of them are left shift instructions – but if
you pass a negative shift count, they'll shift right. So the vector
shift distances are always vectors of signed integers, regardless
of whether you're considering the other input vector to be of signed
or unsigned. Also, even the simplest vshlq instruction in this
family (neither saturating nor rounding) has to be implemented as an
IR intrinsic, because the ordinary LLVM IR shl operation would
consider an out-of-range shift count to be undefined behavior.