The pattern Or(And(A, MaskValue), And(B, ~MaskValue)), where ~MaskValue = Xor(MaskValue, -1) gets lowered to bitselect instruction when NEON is available. However, when this pattern is in a loop and MaskValue lives outside of the immediate basic block, instruction selection isn't able to choose bitselect and we end up with sequence of ORs and ANDs. This patch sinks such MaskValue into the basic block to allow backend to select bit select instructions.
This will solve performance bugs mentioned in this comment: https://github.com/llvm/llvm-project/issues/49305#issuecomment-1440828393
VBSL intrinsics can be found here: https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vbsl
That sounds like it might be a bug that happens if it tries to sink too many operands? From what I remember the order they are put into Ops might matter. And if it is sinking to the Or it might need to add both the And as well as the Not.