The pattern Or(And(A, MaskValue), And(B, ~MaskValue)), where ~MaskValue = Xor(MaskValue, -1) gets lowered to bitselect instruction when NEON is available. However, when this pattern is in a loop and MaskValue lives outside of the immediate basic block, instruction selection isn't able to choose bitselect and we end up with sequence of ORs and ANDs. This patch sinks such MaskValue into the basic block to allow backend to select bit select instructions.
This will solve performance bugs mentioned in this comment: https://github.com/llvm/llvm-project/issues/49305#issuecomment-1440828393
VBSL intrinsics can be found here: https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vbsl