This uses BITCAST (and maybe CONCAT_VECTORS) and VECTOR_SHUFFLE instead
of scalarizing the vector with EXTRACT_VECTOR_ELTs and rebuilding it
with BUILD_VECTOR. This generates more efficient code in the majority of
affected test cases.
The original motivation for this was to avoid some redundant AND
instructions introduced by D87502, which were due to EXTRACT_VECTOR_ELT
sometimes being optimized to a constant, which makes it harder to
optimize away the scalarize-and-then-rebuild sequence.
clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]
not useful