If we have a constant vector mask with the shift values being all equal,

we can simplify aarch64_neon_sshl to VSHL.

This pattern can be generated by code using vshlq_s32(a,vdupq_n_s32(n))

instead of vshlq_n_s32(a, n), because it is used in contexts where n is

not guaranteed to be constant, before inlining.

We can do a similar combine for aarch64_neon_ushl, but we have to be

a bit more careful, because we can only match ushll/ushll2 for vector

shifts with a zero-extended first operand.

Also adds 2 tests marked with FIXME, where we can further increase

codegen.