Page MenuHomePhabricator

[X86] Combine (cmov (and/or (setcc) (setcc))) into (cmov (cmov))
ClosedPublic

Authored by ab on Feb 13 2015, 4:23 PM.

Details

Reviewers
MatzeB
Summary

Fold and/or of setcc's to double CMOV:

(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)

In practice, when we need to lower to control flow using a custom inserter,
we represent the double CMOV using a special node, CMOV2:

(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV2 F, T, cc1, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV2 T, F, !cc1, !cc2)

This enables the custom inserter to only need to insert one PHI, instead
of two if it looked at two independent CMOVs.

This combine lets us generate:

cmovcc1 (jcc1 if we don't have CMOV)
cmovcc2 (same)

instead of:

setcc1
setcc2
and/or
cmovne (jne if we don't have CMOV)

When we can't use the CMOV instruction, it might increase branch mispredicts.
When we can, or when there is no mispredict, this improves throughput and reduces register pressure.


This is pretty much the same thing as D7622, except here the pattern is exposed during ops legalization. By the time we get to the combiner, we already have X86ISD nodes, so even an independent combine wouldn't have been useful.

[EDIT: dropped the CMOV2 for now, I'll put that in a separate patch]
The CMOV2 pseudo is yucky, but I can't see an alternative. If you do it the obvious way (just form two CMOVs all the time) the custom inserter (which definitely should not look at other instructions) adds a PHI between the two jumps, which ends up creating a few copies all around. For instance, for the (sitofp (zext (fcmp une))) testcase:

        ucomiss %xmm1, %xmm0
        movss  <1.0f>, %xmm0
        movaps  %xmm0, %xmm1
        jne     .LBB5_2
        xorps   %xmm1, %xmm1
.LBB5_2:
        jp      .LBB5_4
        movaps  %xmm1, %xmm0
.LBB5_4:
        retq

We could avoid the problem by optimizing something like this:

  A
  | \
  |  B
  | /
  C
  | \
  |  D
  | /
  E

A: X = ...; Y = ...
B: empty
C: Z = PHI [X, A], [Y, B]
D: empty
E: PHI [X, C], [Z, D]

To this:

  A
  | \
  |  C
  | /|
  |/ |
  |  |
  |  D
  | /
  E

A: X = ...; Y = ...
D: empty
E: PHI [X, A], [X, C], [Z, D]

That sounds awfully specific though, and, having looked into it, I'm not sure which more general pass should do it.

Of course, we could just ignore the problem, and have a few extra MOVs around.

Diff Detail

Event Timeline

ab updated this revision to Diff 19947.Feb 13 2015, 4:23 PM
ab retitled this revision from to [X86] Combine (cmov (and/or (setcc) (setcc))) into (cmov (cmov)).
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added subscribers: Unknown Object (MLST), MatzeB.
ab updated this revision to Diff 19950.Feb 13 2015, 4:49 PM
  • Add missing CMOV2 documentation.
  • Isolate CMOV pseudo multiclass refactoring
ab updated this revision to Diff 19953.Feb 13 2015, 5:42 PM

Add back the test file..

ab updated this object.Feb 17 2015, 11:17 AM
ab added a subscriber: qcolombet.
ab updated this revision to Diff 21024.Mar 2 2015, 11:45 AM
ab updated this object.
  • Split out CMOV2 changes
MatzeB accepted this revision.Mar 2 2015, 11:58 AM
MatzeB added a reviewer: MatzeB.

I've got some small nitpicks, but LGTM.

lib/Target/X86/X86ISelLowering.cpp
21016

I would move this before the switch statement to be closer to the def.

21240

I would not initialize this var here and rather make sure that checkBoolTestAndOrSetCCCombine() always sets isAndSetCC if it returns true.

This revision is now accepted and ready to land.Mar 2 2015, 11:58 AM
ab closed this revision.Mar 2 2015, 5:16 PM

r231045.

I guess phabricator isn't content with a link to the revision to auto-close.

-Ahmed