Splitting this off from D32665 to make sure I've thought it about correctly.
If we have:
~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need the intermediate (~X & Y) value. In that case, we would need an extra instruction to generate ~Y + 'or' (as shown in the test changes).
It's ok if we have multiple uses of ~X or Y, however. In those cases, we won't reduce the instruction count or critical path, but we might improve throughput because we can generate ~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something we can't answer in IR.