If we have a constant vector mask with the shift values being all equal,
we can simplify aarch64_neon_sshl to VSHL.
This pattern can be generated by code using vshlq_s32(a,vdupq_n_s32(n))
instead of vshlq_n_s32(a, n), because it is used in contexts where n is
not guaranteed to be constant, before inlining.
We can do a similar combine for aarch64_neon_ushl, but we have to be
a bit more careful, because we can only match ushll/ushll2 for vector
shifts with a zero-extended first operand.
Also adds 2 tests marked with FIXME, where we can further increase
codegen.