Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
As well, allows <8xhalf> and <4xhalf> vldup1 operations.
These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.