This is essentially expanding on the optimizations added on: D120199
but applies the optimization to cases where the bit being changed /
tested is not am IMM but is a provable power of 2.
The only case currently added for cases like:
__atomic_fetch_xor(p, 1 << c, __ATOMIC_RELAXED) & (1 << c)
Which instead of using a cmpxchg loop can be done with btcl; setcc; shl.
There are still a variety of missed cases that could/should be
addressed in the future. This commit documents many of those
cases with Todos.
Should the Size be same with operand 1 rather than result?