(X | Y) - Y --> (X | Y) ^ Y
(Y | X) - Y --> (X | Y) ^ Y
I verified the correctness using Alive:
https://rise4fun.com/Alive/czes
This transform enables these further transforms that already exist in instcombine:
(X | Y) ^ Y --> X & ~Y
(Y | X) ^ Y --> X & ~Y
As a result, the full expected transform is:
(X | Y) - Y --> X & ~Y
(Y | X) - Y --> X & ~Y
I've added tests for cases where Y is constant and where Y is non-constant (with operands in either order).
In the constant case the optimisation is a clear win as we go from 2 instructions to 1 as we can pre-compute ~Y.
I checked that the combine still appears to be profitable when Y is non-constant, by compiling for x86_64 -mpcu=btver2 where I observed that we go from generating
movl %ecx, %eax orl %edx, %eax subl %edx, %eax
to
andnl %ecx, %edx, %eax
This needs a newline