Fold and/or of setcc's to double CMOV:
(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2) (CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)
In practice, when we need to lower to control flow using a custom inserter,
we represent the double CMOV using a special node, CMOV2:
(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV2 F, T, cc1, cc2) (CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV2 T, F, !cc1, !cc2)
This enables the custom inserter to only need to insert one PHI, instead
of two if it looked at two independent CMOVs.
This combine lets us generate:
cmovcc1 (jcc1 if we don't have CMOV) cmovcc2 (same)
instead of:
setcc1 setcc2 and/or cmovne (jne if we don't have CMOV)
When we can't use the CMOV instruction, it might increase branch mispredicts.
When we can, or when there is no mispredict, this improves throughput and reduces register pressure.
This is pretty much the same thing as D7622, except here the pattern is exposed during ops legalization. By the time we get to the combiner, we already have X86ISD nodes, so even an independent combine wouldn't have been useful.
[EDIT: dropped the CMOV2 for now, I'll put that in a separate patch]
The CMOV2 pseudo is yucky, but I can't see an alternative. If you do it the obvious way (just form two CMOVs all the time) the custom inserter (which definitely should not look at other instructions) adds a PHI between the two jumps, which ends up creating a few copies all around. For instance, for the (sitofp (zext (fcmp une))) testcase:
ucomiss %xmm1, %xmm0 movss <1.0f>, %xmm0 movaps %xmm0, %xmm1 jne .LBB5_2 xorps %xmm1, %xmm1 .LBB5_2: jp .LBB5_4 movaps %xmm1, %xmm0 .LBB5_4: retq
We could avoid the problem by optimizing something like this:
A | \ | B | / C | \ | D | / E A: X = ...; Y = ... B: empty C: Z = PHI [X, A], [Y, B] D: empty E: PHI [X, C], [Z, D]
To this:
A | \ | C | /| |/ | | | | D | / E A: X = ...; Y = ... D: empty E: PHI [X, A], [X, C], [Z, D]
That sounds awfully specific though, and, having looked into it, I'm not sure which more general pass should do it.
Of course, we could just ignore the problem, and have a few extra MOVs around.
I would move this before the switch statement to be closer to the def.