Page MenuHomePhabricator

X86TargetLowering::LowerSELECT(): don't promote CMOV's if the subtarget does't have them

Authored by lebedev.ri on Mar 5 2019, 3:08 PM.



I'm not actually sure this patch does the right thing, but then
i'm not sure i understand why we would want to do that promotion
if we don't have CMOV, and thus will expand it to a branch?
Shouldn't some later code be responsible for these decisions?

The real reason for this patch:
I've looked at extending CMOV promotion to support i8, (PR40965)
and if this check is not in place, llvm/test/CodeGen/X86/pseudo_cmov_lower.ll goes real bad.

Diff Detail


Event Timeline

lebedev.ri created this revision.Mar 5 2019, 3:08 PM
craig.topper added inline comments.Mar 5 2019, 3:31 PM

Do you understand why we went from one conditional branch to two?

lebedev.ri added inline comments.Mar 6 2019, 12:00 AM

I'm not sure yet, but i just want to point out that in all other cases we already had 2 conditional jumps.
In this case we had one conditional jump and one uncondititonal jump, and that regressed to two conditional jumps.

lebedev.ri added inline comments.Mar 6 2019, 12:55 AM

Hmm, interesting.

# *** IR Dump After Expand ISel Pseudo-instructions ***:
# Machine code for function scalar_i16_unsigned_reg_reg: IsSSA, TracksLiveness
Frame Objects:
  fi#-2: size=2, align=4, fixed, at location [SP+8]
  fi#-1: size=2, align=4, fixed, at location [SP+4]

bb.0 (%ir-block.0):
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

  %0:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load 2 from %fixed-stack.1, align 4)
  %1:gr16 = COPY %0.sub_16bit:gr32
  %2:gr16 = MOV16rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load 2 from %fixed-stack.0, align 4)
  %3:gr16 = SUB16rr %1:gr16(tied-def 0), %2:gr16, implicit-def $eflags
  %4:gr8 = SETBEr implicit $eflags
  %5:gr32_nosp = MOVZX32rr8 killed %4:gr8
  %6:gr32 = LEA32r %5:gr32_nosp, 1, %5:gr32_nosp, -1, $noreg
  JA_1 %bb.2, implicit $eflags

bb.1 (%ir-block.0):
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)
  liveins: $eflags

bb.2 (%ir-block.0):
; predecessors: %bb.0, %bb.1
  successors: %bb.3(0x40000000), %bb.4(0x40000000); %bb.3(50.00%), %bb.4(50.00%)
  liveins: $eflags
  %7:gr16 = PHI %1:gr16, %bb.1, %2:gr16, %bb.0
  %9:gr32 = IMPLICIT_DEF
  %8:gr32 = INSERT_SUBREG %9:gr32(tied-def 0), killed %7:gr16, %subreg.sub_16bit
  JA_1 %bb.4, implicit $eflags

bb.3 (%ir-block.0):
; predecessors: %bb.2
  successors: %bb.4(0x80000000); %bb.4(100.00%)

bb.4 (%ir-block.0):
; predecessors: %bb.2, %bb.3

  %10:gr16 = PHI %2:gr16, %bb.3, %1:gr16, %bb.2
  %12:gr32 = IMPLICIT_DEF
  %11:gr32 = INSERT_SUBREG %12:gr32(tied-def 0), killed %10:gr16, %subreg.sub_16bit
  %13:gr32 = SUB32rr %11:gr32(tied-def 0), killed %8:gr32, implicit-def dead $eflags
  %14:gr16 = COPY %13.sub_16bit:gr32
  %15:gr32 = MOVZX32rr16 killed %14:gr16
  %16:gr32 = SHR32r1 %15:gr32(tied-def 0), implicit-def dead $eflags
  %17:gr32 = IMUL32rr %16:gr32(tied-def 0), killed %6:gr32, implicit-def dead $eflags
  %18:gr32 = ADD32rr %17:gr32(tied-def 0), %0:gr32, implicit-def dead $eflags
  %19:gr16 = COPY %18.sub_16bit:gr32
  $ax = COPY %19:gr16
  RET 0, $ax

# End machine code for function scalar_i16_unsigned_reg_reg.

So we indeed have two conditional jumps based on the same condition.
But i'm not sure i understand.

  • bb.4 is reachable from conditional jump from bb.2 OR via a fallthrough from bb.3.
  • bb.3 is only reachable via fallthrough from bb.2.
  • bb.3 is empty
  • Thus, bb.4 will always be visited if we visited bb.2.

This sounds wrong to me (i'm likely not getting something),
but why can't we just fold bb.3 and bb.4 into bb.2?

lebedev.ri added inline comments.Mar 6 2019, 12:58 AM

Though that of course tries to answer the question of why we have two conditional jumps,
not why we regress from an unconditional jump to a conditional jump.

lebedev.ri marked 5 inline comments as done.Mar 6 2019, 2:17 AM
lebedev.ri added inline comments.

I have digged a bit, and while i'm unable to answer *why* this *regresses*,
i do believe this ends up simply exposing an existing missing optimization,
which i have filed as with all the data i have.

The cmoves need to be adjacent going into that pass. Here is the code that detects multiple cmoves in EmitLoweredSelect

if (isCMOVPseudo(MI)) {
  // See if we have a string of CMOVS with the same condition. Skip over
  // intervening debug insts.
  while (NextMIIt != ThisMBB->end() && isCMOVPseudo(*NextMIIt) &&
         (NextMIIt->getOperand(3).getImm() == CC ||
          NextMIIt->getOperand(3).getImm() == OppCC)) {
    LastCMOV = &*NextMIIt;
    NextMIIt = skipDebugInstructionsForward(NextMIIt, ThisMBB->end());
lebedev.ri abandoned this revision.Mar 7 2019, 10:05 AM
lebedev.ri marked an inline comment as done.

Okay, this isn't the right way indeed.
We likely want to keep that restriction for the new i8 case in D59035 to
not regress existing cases, but in general we should always do this promotion.

Also, EmitLoweredSelect() needs to be fixed to accept some intermediate
instructions between two CMOV's, like in @scalar_i16_unsigned_reg_reg in this diff,