This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
fshl.ll
-
fshr.ll
5/5
midpoint-int.ll
-
select.ll

Differential D59001

X86TargetLowering::LowerSELECT(): don't promote CMOV's if the subtarget does't have them
AbandonedPublic

Authored by lebedev.ri on Mar 5 2019, 3:08 PM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
spatel

Summary

I'm not actually sure this patch does the right thing, but then
i'm not sure i understand why we would want to do that promotion
if we don't have CMOV, and thus will expand it to a branch?
Shouldn't some later code be responsible for these decisions?

The real reason for this patch:
I've looked at extending CMOV promotion to support i8, (PR40965)
and if this check is not in place, llvm/test/CodeGen/X86/pseudo_cmov_lower.ll goes real bad.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Mar 5 2019, 3:08 PM

craig.topper added inline comments.Mar 5 2019, 3:31 PM

test/CodeGen/X86/midpoint-int.ll
803	Do you understand why we went from one conditional branch to two?

lebedev.ri added inline comments.Mar 6 2019, 12:00 AM

test/CodeGen/X86/midpoint-int.ll
803	I'm not sure yet, but i just want to point out that in all other cases we already had 2 conditional jumps. In this case we had one conditional jump and one uncondititonal jump, and that regressed to two conditional jumps.

lebedev.ri added inline comments.Mar 6 2019, 12:55 AM

test/CodeGen/X86/midpoint-int.ll

803

Hmm, interesting.

# *** IR Dump After Expand ISel Pseudo-instructions ***:
# Machine code for function scalar_i16_unsigned_reg_reg: IsSSA, TracksLiveness
Frame Objects:
  fi#-2: size=2, align=4, fixed, at location [SP+8]
  fi#-1: size=2, align=4, fixed, at location [SP+4]

bb.0 (%ir-block.0):
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

  %0:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load 2 from %fixed-stack.1, align 4)
  %1:gr16 = COPY %0.sub_16bit:gr32
  %2:gr16 = MOV16rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load 2 from %fixed-stack.0, align 4)
  %3:gr16 = SUB16rr %1:gr16(tied-def 0), %2:gr16, implicit-def $eflags
  %4:gr8 = SETBEr implicit $eflags
  %5:gr32_nosp = MOVZX32rr8 killed %4:gr8
  %6:gr32 = LEA32r %5:gr32_nosp, 1, %5:gr32_nosp, -1, $noreg
  JA_1 %bb.2, implicit $eflags

bb.1 (%ir-block.0):
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)
  liveins: $eflags

bb.2 (%ir-block.0):
; predecessors: %bb.0, %bb.1
  successors: %bb.3(0x40000000), %bb.4(0x40000000); %bb.3(50.00%), %bb.4(50.00%)
  liveins: $eflags
  %7:gr16 = PHI %1:gr16, %bb.1, %2:gr16, %bb.0
  %9:gr32 = IMPLICIT_DEF
  %8:gr32 = INSERT_SUBREG %9:gr32(tied-def 0), killed %7:gr16, %subreg.sub_16bit
  JA_1 %bb.4, implicit $eflags

bb.3 (%ir-block.0):
; predecessors: %bb.2
  successors: %bb.4(0x80000000); %bb.4(100.00%)


bb.4 (%ir-block.0):
; predecessors: %bb.2, %bb.3

  %10:gr16 = PHI %2:gr16, %bb.3, %1:gr16, %bb.2
  %12:gr32 = IMPLICIT_DEF
  %11:gr32 = INSERT_SUBREG %12:gr32(tied-def 0), killed %10:gr16, %subreg.sub_16bit
  %13:gr32 = SUB32rr %11:gr32(tied-def 0), killed %8:gr32, implicit-def dead $eflags
  %14:gr16 = COPY %13.sub_16bit:gr32
  %15:gr32 = MOVZX32rr16 killed %14:gr16
  %16:gr32 = SHR32r1 %15:gr32(tied-def 0), implicit-def dead $eflags
  %17:gr32 = IMUL32rr %16:gr32(tied-def 0), killed %6:gr32, implicit-def dead $eflags
  %18:gr32 = ADD32rr %17:gr32(tied-def 0), %0:gr32, implicit-def dead $eflags
  %19:gr16 = COPY %18.sub_16bit:gr32
  $ax = COPY %19:gr16
  RET 0, $ax

# End machine code for function scalar_i16_unsigned_reg_reg.

So we indeed have two conditional jumps based on the same condition.
But i'm not sure i understand.

bb.4 is reachable from conditional jump from bb.2 OR via a fallthrough from bb.3.
bb.3 is only reachable via fallthrough from bb.2.
bb.3 is empty
Thus, bb.4 will always be visited if we visited bb.2.

This sounds wrong to me (i'm likely not getting something),
but why can't we just fold bb.3 and bb.4 into bb.2?

lebedev.ri added inline comments.Mar 6 2019, 12:58 AM

test/CodeGen/X86/midpoint-int.ll
803	Though that of course tries to answer the question of why we have two conditional jumps, not why we regress from an unconditional jump to a conditional jump.

lebedev.ri marked 5 inline comments as done.Mar 6 2019, 2:17 AM

lebedev.ri added inline comments.

test/CodeGen/X86/midpoint-int.ll
803	I have digged a bit, and while i'm unable to answer why this regresses, i do believe this ends up simply exposing an existing missing optimization, which i have filed as https://bugs.llvm.org/show_bug.cgi?id=40974 with all the data i have.

The cmoves need to be adjacent going into that pass. Here is the code that detects multiple cmoves in EmitLoweredSelect

if (isCMOVPseudo(MI)) {
  // See if we have a string of CMOVS with the same condition. Skip over
  // intervening debug insts.
  while (NextMIIt != ThisMBB->end() && isCMOVPseudo(*NextMIIt) &&
         (NextMIIt->getOperand(3).getImm() == CC ||
          NextMIIt->getOperand(3).getImm() == OppCC)) {
    LastCMOV = &*NextMIIt;
    ++NextMIIt;
    NextMIIt = skipDebugInstructionsForward(NextMIIt, ThisMBB->end());
  }
}

lebedev.ri added a child revision: D59035: [X86] Promote i8 CMOV's (PR40965).Mar 6 2019, 11:36 AM

lebedev.ri removed a child revision: D59035: [X86] Promote i8 CMOV's (PR40965).Mar 7 2019, 10:01 AM

Okay, this isn't the right way indeed.
We likely want to keep that restriction for the new i8 case in D59035 to
not regress existing cases, but in general we should always do this promotion.

Also, EmitLoweredSelect() needs to be fixed to accept some intermediate
instructions between two CMOV's, like in @scalar_i16_unsigned_reg_reg in this diff,

lebedev.ri mentioned this in D59147: Broken, not for review: X86TargetLowering::EmitLoweredSelect(): ignore harmless instrs between two PHI's.Mar 8 2019, 11:53 AM

lebedev.ri mentioned this in D59035: [X86] Promote i8 CMOV's (PR40965).

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

9 lines

test/

CodeGen/

X86/

2 lines

2 lines

146 lines

2 lines

Diff 189407

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 20,529 Lines • ▼ Show 20 Lines	if ((CondCode == X86::COND_AE \|\| CondCode == X86::COND_B) &&
return DAG.getNOT(DL, Res, Res.getValueType());		return DAG.getNOT(DL, Res, Res.getValueType());
return Res;		return Res;
}		}
}		}

// X86 doesn't have an i8 cmov. If both operands are the result of a truncate		// X86 doesn't have an i8 cmov. If both operands are the result of a truncate
// widen the cmov and push the truncate through. This avoids introducing a new		// widen the cmov and push the truncate through. This avoids introducing a new
// branch during isel and doesn't add any extensions.		// branch during isel and doesn't add any extensions.
if (Op.getValueType() == MVT::i8 &&		// It would make sense to do this only when there is CMOV.
		// Else, it should be best to leave the decision to the later code.
		if (Subtarget.hasCMov() && Op.getValueType() == MVT::i8 &&
Op1.getOpcode() == ISD::TRUNCATE && Op2.getOpcode() == ISD::TRUNCATE) {		Op1.getOpcode() == ISD::TRUNCATE && Op2.getOpcode() == ISD::TRUNCATE) {
SDValue T1 = Op1.getOperand(0), T2 = Op2.getOperand(0);		SDValue T1 = Op1.getOperand(0), T2 = Op2.getOperand(0);
if (T1.getValueType() == T2.getValueType() &&		if (T1.getValueType() == T2.getValueType() &&
// Blacklist CopyFromReg to avoid partial register stalls.		// Blacklist CopyFromReg to avoid partial register stalls.
T1.getOpcode() != ISD::CopyFromReg && T2.getOpcode()!=ISD::CopyFromReg){		T1.getOpcode() != ISD::CopyFromReg && T2.getOpcode()!=ISD::CopyFromReg){
SDValue Cmov = DAG.getNode(X86ISD::CMOV, DL, T1.getValueType(), T2, T1,		SDValue Cmov = DAG.getNode(X86ISD::CMOV, DL, T1.getValueType(), T2, T1,
CC, Cond);		CC, Cond);
return DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), Cmov);		return DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), Cmov);
}		}
}		}

// Promote i16 cmovs if it won't prevent folding a load.		// Promote i16 cmovs if it won't prevent folding a load.
if (Op.getValueType() == MVT::i16 && !MayFoldLoad(Op1) && !MayFoldLoad(Op2)) {		// But it would make sense to do this only when there is CMOV.
		// Else, it should be best to leave the decision to the later code.
		if (Subtarget.hasCMov() && Op.getValueType() == MVT::i16 &&
		!MayFoldLoad(Op1) && !MayFoldLoad(Op2)) {
Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Op1);		Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Op1);
Op2 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Op2);		Op2 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Op2);
SDValue Ops[] = { Op2, Op1, CC, Cond };		SDValue Ops[] = { Op2, Op1, CC, Cond };
SDValue Cmov = DAG.getNode(X86ISD::CMOV, DL, MVT::i32, Ops);		SDValue Cmov = DAG.getNode(X86ISD::CMOV, DL, MVT::i32, Ops);
return DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), Cmov);		return DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), Cmov);
}		}

// X86ISD::CMOV means set the result (which is operand 1) to the RHS if		// X86ISD::CMOV means set the result (which is operand 1) to the RHS if
▲ Show 20 Lines • Show All 23,040 Lines • Show Last 20 Lines

test/CodeGen/X86/fshl.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X86-FAST-NEXT: andb $15, %cl			; X86-FAST-NEXT: andb $15, %cl
	; X86-FAST-NEXT: shldw %cl, %dx, %ax			; X86-FAST-NEXT: shldw %cl, %dx, %ax
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i16:			; X86-SLOW-LABEL: var_shift_i16:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %edi			; X86-SLOW-NEXT: pushl %edi
	; X86-SLOW-NEXT: pushl %esi			; X86-SLOW-NEXT: pushl %esi
				; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %esi			; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl			; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl
	; X86-SLOW-NEXT: andb $15, %dl			; X86-SLOW-NEXT: andb $15, %dl
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, %edi			; X86-SLOW-NEXT: movl %eax, %edi
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: movl %edx, %ecx
	; X86-SLOW-NEXT: shll %cl, %edi			; X86-SLOW-NEXT: shll %cl, %edi
	; X86-SLOW-NEXT: movb $16, %cl			; X86-SLOW-NEXT: movb $16, %cl
	; X86-SLOW-NEXT: subb %dl, %cl			; X86-SLOW-NEXT: subb %dl, %cl
	; X86-SLOW-NEXT: shrl %cl, %esi			; X86-SLOW-NEXT: shrl %cl, %esi
	; X86-SLOW-NEXT: testb %dl, %dl			; X86-SLOW-NEXT: testb %dl, %dl
	; X86-SLOW-NEXT: je .LBB1_2			; X86-SLOW-NEXT: je .LBB1_2
	▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

test/CodeGen/X86/fshr.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X86-FAST-NEXT: andb $15, %cl			; X86-FAST-NEXT: andb $15, %cl
	; X86-FAST-NEXT: shrdw %cl, %dx, %ax			; X86-FAST-NEXT: shrdw %cl, %dx, %ax
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i16:			; X86-SLOW-LABEL: var_shift_i16:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %edi			; X86-SLOW-NEXT: pushl %edi
	; X86-SLOW-NEXT: pushl %esi			; X86-SLOW-NEXT: pushl %esi
				; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl			; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl
	; X86-SLOW-NEXT: andb $15, %dl			; X86-SLOW-NEXT: andb $15, %dl
	; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, %edi			; X86-SLOW-NEXT: movl %eax, %edi
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: movl %edx, %ecx
	; X86-SLOW-NEXT: shrl %cl, %edi			; X86-SLOW-NEXT: shrl %cl, %edi
	; X86-SLOW-NEXT: movb $16, %cl			; X86-SLOW-NEXT: movb $16, %cl
	; X86-SLOW-NEXT: subb %dl, %cl			; X86-SLOW-NEXT: subb %dl, %cl
	; X86-SLOW-NEXT: shll %cl, %esi			; X86-SLOW-NEXT: shll %cl, %esi
	; X86-SLOW-NEXT: testb %dl, %dl			; X86-SLOW-NEXT: testb %dl, %dl
	; X86-SLOW-NEXT: je .LBB1_2			; X86-SLOW-NEXT: je .LBB1_2
	▲ Show 20 Lines • Show All 420 Lines • Show Last 20 Lines

test/CodeGen/X86/midpoint-int.ll

	Show First 20 Lines • Show All 724 Lines • ▼ Show 20 Lines
	; X64-NEXT: shrl %eax			; X64-NEXT: shrl %eax
	; X64-NEXT: imull %ecx, %eax			; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: addl %edi, %eax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: scalar_i16_signed_reg_reg:			; X32-LABEL: scalar_i16_signed_reg_reg:
	; X32: # %bb.0:			; X32: # %bb.0:
				; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: cmpw %ax, %cx			; X32-NEXT: movzwl {{[0-9]+}}(%esp), %edx
	; X32-NEXT: setle %dl			; X32-NEXT: xorl %ebx, %ebx
	; X32-NEXT: movl %eax, %esi			; X32-NEXT: cmpw %dx, %cx
				; X32-NEXT: setle %bl
				; X32-NEXT: movl %edx, %edi
	; X32-NEXT: jg .LBB10_2			; X32-NEXT: jg .LBB10_2
	; X32-NEXT: # %bb.1:			; X32-NEXT: # %bb.1:
	; X32-NEXT: movl %ecx, %esi			; X32-NEXT: movl %eax, %edi
	; X32-NEXT: .LBB10_2:			; X32-NEXT: .LBB10_2:
	; X32-NEXT: leal -1(%edx,%edx), %edx			; X32-NEXT: leal -1(%ebx,%ebx), %esi
	; X32-NEXT: movl %ecx, %edi
	; X32-NEXT: jge .LBB10_4			; X32-NEXT: jge .LBB10_4
	; X32-NEXT: # %bb.3:			; X32-NEXT: # %bb.3:
	; X32-NEXT: movl %eax, %edi			; X32-NEXT: movl %edx, %eax
	; X32-NEXT: .LBB10_4:			; X32-NEXT: .LBB10_4:
	; X32-NEXT: subl %esi, %edi			; X32-NEXT: subl %edi, %eax
	; X32-NEXT: movzwl %di, %eax			; X32-NEXT: movzwl %ax, %eax
	; X32-NEXT: shrl %eax			; X32-NEXT: shrl %eax
	; X32-NEXT: imull %edx, %eax			; X32-NEXT: imull %esi, %eax
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: # kill: def $ax killed $ax killed $eax			; X32-NEXT: # kill: def $ax killed $ax killed $eax
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
				; X32-NEXT: popl %ebx
	; X32-NEXT: retl			; X32-NEXT: retl
	%t3 = icmp sgt i16 %a1, %a2 ; signed			%t3 = icmp sgt i16 %a1, %a2 ; signed
	%t4 = select i1 %t3, i16 -1, i16 1			%t4 = select i1 %t3, i16 -1, i16 1
	%t5 = select i1 %t3, i16 %a2, i16 %a1			%t5 = select i1 %t3, i16 %a2, i16 %a1
	%t6 = select i1 %t3, i16 %a1, i16 %a2			%t6 = select i1 %t3, i16 %a1, i16 %a2
	%t7 = sub i16 %t6, %t5			%t7 = sub i16 %t6, %t5
	%t8 = lshr i16 %t7, 1			%t8 = lshr i16 %t7, 1
	%t9 = mul nsw i16 %t8, %t4 ; signed			%t9 = mul nsw i16 %t8, %t4 ; signed
	Show All 16 Lines
	; X64-NEXT: shrl %eax			; X64-NEXT: shrl %eax
	; X64-NEXT: imull %ecx, %eax			; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: addl %edi, %eax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: scalar_i16_unsigned_reg_reg:			; X32-LABEL: scalar_i16_unsigned_reg_reg:
	; X32: # %bb.0:			; X32: # %bb.0:
				; X32-NEXT: pushl %ebx
				; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: xorl %edx, %edx
	; X32-NEXT: cmpw %ax, %cx
	; X32-NEXT: setbe %dl
	; X32-NEXT: leal -1(%edx,%edx), %edx
	; X32-NEXT: ja .LBB11_1
	; X32-NEXT: # %bb.2:
	; X32-NEXT: movl %ecx, %esi
	; X32-NEXT: jmp .LBB11_3
	; X32-NEXT: .LBB11_1:
	; X32-NEXT: movl %eax, %esi
	; X32-NEXT: movl %ecx, %eax			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: .LBB11_3:			; X32-NEXT: movzwl {{[0-9]+}}(%esp), %edx
	; X32-NEXT: subl %esi, %eax			; X32-NEXT: xorl %ebx, %ebx
				; X32-NEXT: cmpw %dx, %cx
				; X32-NEXT: setbe %bl
				; X32-NEXT: movl %edx, %edi
				; X32-NEXT: ja .LBB11_2
				craig.topperUnsubmitted Done Reply Inline Actions Do you understand why we went from one conditional branch to two? craig.topper: Do you understand why we went from one conditional branch to two?
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions I'm not sure yet, but i just want to point out that in all other cases we already had 2 conditional jumps. In this case we had one conditional jump and one uncondititonal jump, and that regressed to two conditional jumps. lebedev.ri: I'm not sure yet, but i just want to point out that in all other cases we already had 2…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Hmm, interesting. # * IR Dump After Expand ISel Pseudo-instructions : # Machine code for function scalar_i16_unsigned_reg_reg: IsSSA, TracksLiveness Frame Objects: fi#-2: size=2, align=4, fixed, at location [SP+8] fi#-1: size=2, align=4, fixed, at location [SP+4] bb.0 (%ir-block.0): successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%) %0:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load 2 from %fixed-stack.1, align 4) %1:gr16 = COPY %0.sub_16bit:gr32 %2:gr16 = MOV16rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load 2 from %fixed-stack.0, align 4) %3:gr16 = SUB16rr %1:gr16(tied-def 0), %2:gr16, implicit-def $eflags %4:gr8 = SETBEr implicit $eflags %5:gr32_nosp = MOVZX32rr8 killed %4:gr8 %6:gr32 = LEA32r %5:gr32_nosp, 1, %5:gr32_nosp, -1, $noreg JA_1 %bb.2, implicit $eflags bb.1 (%ir-block.0): ; predecessors: %bb.0 successors: %bb.2(0x80000000); %bb.2(100.00%) liveins: $eflags bb.2 (%ir-block.0): ; predecessors: %bb.0, %bb.1 successors: %bb.3(0x40000000), %bb.4(0x40000000); %bb.3(50.00%), %bb.4(50.00%) liveins: $eflags %7:gr16 = PHI %1:gr16, %bb.1, %2:gr16, %bb.0 %9:gr32 = IMPLICIT_DEF %8:gr32 = INSERT_SUBREG %9:gr32(tied-def 0), killed %7:gr16, %subreg.sub_16bit JA_1 %bb.4, implicit $eflags bb.3 (%ir-block.0): ; predecessors: %bb.2 successors: %bb.4(0x80000000); %bb.4(100.00%) bb.4 (%ir-block.0): ; predecessors: %bb.2, %bb.3 %10:gr16 = PHI %2:gr16, %bb.3, %1:gr16, %bb.2 %12:gr32 = IMPLICIT_DEF %11:gr32 = INSERT_SUBREG %12:gr32(tied-def 0), killed %10:gr16, %subreg.sub_16bit %13:gr32 = SUB32rr %11:gr32(tied-def 0), killed %8:gr32, implicit-def dead $eflags %14:gr16 = COPY %13.sub_16bit:gr32 %15:gr32 = MOVZX32rr16 killed %14:gr16 %16:gr32 = SHR32r1 %15:gr32(tied-def 0), implicit-def dead $eflags %17:gr32 = IMUL32rr %16:gr32(tied-def 0), killed %6:gr32, implicit-def dead $eflags %18:gr32 = ADD32rr %17:gr32(tied-def 0), %0:gr32, implicit-def dead $eflags %19:gr16 = COPY %18.sub_16bit:gr32 $ax = COPY %19:gr16 RET 0, $ax # End machine code for function scalar_i16_unsigned_reg_reg. So we indeed have two conditional jumps based on the same condition. But i'm not sure i understand. `bb.4` is reachable from conditional jump from `bb.2` OR via a fallthrough from `bb.3`. `bb.3` is only reachable via fallthrough from `bb.2`. `bb.3` is empty Thus, `bb.4` will always be visited if we visited `bb.2`. This sounds wrong to me (i'm likely not getting something), but why can't we just fold `bb.3` and `bb.4` into `bb.2`? lebedev.ri:* Hmm, interesting. ```# * IR Dump After Expand ISel Pseudo-instructions *: # Machine code…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Though that of course tries to answer the question of why we have two conditional jumps, not why we regress from an unconditional jump to a conditional jump. lebedev.ri: Though that of course tries to answer the question of why we have two conditional jumps, not…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions I have digged a bit, and while i'm unable to answer why this regresses, i do believe this ends up simply exposing an existing missing optimization, which i have filed as https://bugs.llvm.org/show_bug.cgi?id=40974 with all the data i have. lebedev.ri: I have digged a bit, and while i'm unable to answer why this regresses, i do believe this…
				; X32-NEXT: # %bb.1:
				; X32-NEXT: movl %eax, %edi
				; X32-NEXT: .LBB11_2:
				; X32-NEXT: leal -1(%ebx,%ebx), %esi
				; X32-NEXT: ja .LBB11_4
				; X32-NEXT: # %bb.3:
				; X32-NEXT: movl %edx, %eax
				; X32-NEXT: .LBB11_4:
				; X32-NEXT: subl %edi, %eax
	; X32-NEXT: movzwl %ax, %eax			; X32-NEXT: movzwl %ax, %eax
	; X32-NEXT: shrl %eax			; X32-NEXT: shrl %eax
	; X32-NEXT: imull %edx, %eax			; X32-NEXT: imull %esi, %eax
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: # kill: def $ax killed $ax killed $eax			; X32-NEXT: # kill: def $ax killed $ax killed $eax
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
				; X32-NEXT: popl %edi
				; X32-NEXT: popl %ebx
	; X32-NEXT: retl			; X32-NEXT: retl
	%t3 = icmp ugt i16 %a1, %a2			%t3 = icmp ugt i16 %a1, %a2
	%t4 = select i1 %t3, i16 -1, i16 1			%t4 = select i1 %t3, i16 -1, i16 1
	%t5 = select i1 %t3, i16 %a2, i16 %a1			%t5 = select i1 %t3, i16 %a2, i16 %a1
	%t6 = select i1 %t3, i16 %a1, i16 %a2			%t6 = select i1 %t3, i16 %a1, i16 %a2
	%t7 = sub i16 %t6, %t5			%t7 = sub i16 %t6, %t5
	%t8 = lshr i16 %t7, 1			%t8 = lshr i16 %t7, 1
	%t9 = mul i16 %t8, %t4			%t9 = mul i16 %t8, %t4
	Show All 19 Lines
	; X64-NEXT: shrl %eax			; X64-NEXT: shrl %eax
	; X64-NEXT: imull %edx, %eax			; X64-NEXT: imull %edx, %eax
	; X64-NEXT: addl %ecx, %eax			; X64-NEXT: addl %ecx, %eax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: scalar_i16_signed_mem_reg:			; X32-LABEL: scalar_i16_signed_mem_reg:
	; X32: # %bb.0:			; X32: # %bb.0:
				; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
				; X32-NEXT: movzwl {{[0-9]+}}(%esp), %edx
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movzwl (%eax), %ecx
	; X32-NEXT: movzwl (%ecx), %ecx			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: xorl %ebx, %ebx
	; X32-NEXT: cmpw %ax, %cx			; X32-NEXT: cmpw %dx, %cx
	; X32-NEXT: setle %dl			; X32-NEXT: setle %bl
	; X32-NEXT: movl %eax, %esi			; X32-NEXT: movl %edx, %edi
	; X32-NEXT: jg .LBB12_2			; X32-NEXT: jg .LBB12_2
	; X32-NEXT: # %bb.1:			; X32-NEXT: # %bb.1:
	; X32-NEXT: movl %ecx, %esi			; X32-NEXT: movl %eax, %edi
	; X32-NEXT: .LBB12_2:			; X32-NEXT: .LBB12_2:
	; X32-NEXT: leal -1(%edx,%edx), %edx			; X32-NEXT: leal -1(%ebx,%ebx), %esi
	; X32-NEXT: movl %ecx, %edi
	; X32-NEXT: jge .LBB12_4			; X32-NEXT: jge .LBB12_4
	; X32-NEXT: # %bb.3:			; X32-NEXT: # %bb.3:
	; X32-NEXT: movl %eax, %edi			; X32-NEXT: movl %edx, %eax
	; X32-NEXT: .LBB12_4:			; X32-NEXT: .LBB12_4:
	; X32-NEXT: subl %esi, %edi			; X32-NEXT: subl %edi, %eax
	; X32-NEXT: movzwl %di, %eax			; X32-NEXT: movzwl %ax, %eax
	; X32-NEXT: shrl %eax			; X32-NEXT: shrl %eax
	; X32-NEXT: imull %edx, %eax			; X32-NEXT: imull %esi, %eax
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: # kill: def $ax killed $ax killed $eax			; X32-NEXT: # kill: def $ax killed $ax killed $eax
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
				; X32-NEXT: popl %ebx
	; X32-NEXT: retl			; X32-NEXT: retl
	%a1 = load i16, i16* %a1_addr			%a1 = load i16, i16* %a1_addr
	%t3 = icmp sgt i16 %a1, %a2 ; signed			%t3 = icmp sgt i16 %a1, %a2 ; signed
	%t4 = select i1 %t3, i16 -1, i16 1			%t4 = select i1 %t3, i16 -1, i16 1
	%t5 = select i1 %t3, i16 %a2, i16 %a1			%t5 = select i1 %t3, i16 %a2, i16 %a1
	%t6 = select i1 %t3, i16 %a1, i16 %a2			%t6 = select i1 %t3, i16 %a1, i16 %a2
	%t7 = sub i16 %t6, %t5			%t7 = sub i16 %t6, %t5
	%t8 = lshr i16 %t7, 1			%t8 = lshr i16 %t7, 1
	Show All 18 Lines
	; X64-NEXT: shrl %eax			; X64-NEXT: shrl %eax
	; X64-NEXT: imull %ecx, %eax			; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: addl %edi, %eax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: scalar_i16_signed_reg_mem:			; X32-LABEL: scalar_i16_signed_reg_mem:
	; X32: # %bb.0:			; X32: # %bb.0:
				; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: movzwl (%eax), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: movzwl (%edx), %edx
	; X32-NEXT: cmpw %ax, %cx			; X32-NEXT: xorl %ebx, %ebx
	; X32-NEXT: setle %dl			; X32-NEXT: cmpw %dx, %cx
	; X32-NEXT: movl %eax, %esi			; X32-NEXT: setle %bl
				; X32-NEXT: movl %edx, %edi
	; X32-NEXT: jg .LBB13_2			; X32-NEXT: jg .LBB13_2
	; X32-NEXT: # %bb.1:			; X32-NEXT: # %bb.1:
	; X32-NEXT: movl %ecx, %esi			; X32-NEXT: movl %eax, %edi
	; X32-NEXT: .LBB13_2:			; X32-NEXT: .LBB13_2:
	; X32-NEXT: leal -1(%edx,%edx), %edx			; X32-NEXT: leal -1(%ebx,%ebx), %esi
	; X32-NEXT: movl %ecx, %edi
	; X32-NEXT: jge .LBB13_4			; X32-NEXT: jge .LBB13_4
	; X32-NEXT: # %bb.3:			; X32-NEXT: # %bb.3:
	; X32-NEXT: movl %eax, %edi			; X32-NEXT: movl %edx, %eax
	; X32-NEXT: .LBB13_4:			; X32-NEXT: .LBB13_4:
	; X32-NEXT: subl %esi, %edi			; X32-NEXT: subl %edi, %eax
	; X32-NEXT: movzwl %di, %eax			; X32-NEXT: movzwl %ax, %eax
	; X32-NEXT: shrl %eax			; X32-NEXT: shrl %eax
	; X32-NEXT: imull %edx, %eax			; X32-NEXT: imull %esi, %eax
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: # kill: def $ax killed $ax killed $eax			; X32-NEXT: # kill: def $ax killed $ax killed $eax
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
				; X32-NEXT: popl %ebx
	; X32-NEXT: retl			; X32-NEXT: retl
	%a2 = load i16, i16* %a2_addr			%a2 = load i16, i16* %a2_addr
	%t3 = icmp sgt i16 %a1, %a2 ; signed			%t3 = icmp sgt i16 %a1, %a2 ; signed
	%t4 = select i1 %t3, i16 -1, i16 1			%t4 = select i1 %t3, i16 -1, i16 1
	%t5 = select i1 %t3, i16 %a2, i16 %a1			%t5 = select i1 %t3, i16 %a2, i16 %a1
	%t6 = select i1 %t3, i16 %a1, i16 %a2			%t6 = select i1 %t3, i16 %a1, i16 %a2
	%t7 = sub i16 %t6, %t5			%t7 = sub i16 %t6, %t5
	%t8 = lshr i16 %t7, 1			%t8 = lshr i16 %t7, 1
	Show All 19 Lines
	; X64-NEXT: shrl %eax			; X64-NEXT: shrl %eax
	; X64-NEXT: imull %edx, %eax			; X64-NEXT: imull %edx, %eax
	; X64-NEXT: addl %ecx, %eax			; X64-NEXT: addl %ecx, %eax
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: scalar_i16_signed_mem_mem:			; X32-LABEL: scalar_i16_signed_mem_mem:
	; X32: # %bb.0:			; X32: # %bb.0:
				; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
				; X32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: movzwl (%eax), %ecx
	; X32-NEXT: movzwl (%ecx), %ecx			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: movzwl (%eax), %eax			; X32-NEXT: movzwl (%edx), %edx
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: xorl %ebx, %ebx
	; X32-NEXT: cmpw %ax, %cx			; X32-NEXT: cmpw %dx, %cx
	; X32-NEXT: setle %dl			; X32-NEXT: setle %bl
	; X32-NEXT: movl %eax, %esi			; X32-NEXT: movl %edx, %edi
	; X32-NEXT: jg .LBB14_2			; X32-NEXT: jg .LBB14_2
	; X32-NEXT: # %bb.1:			; X32-NEXT: # %bb.1:
	; X32-NEXT: movl %ecx, %esi			; X32-NEXT: movl %eax, %edi
	; X32-NEXT: .LBB14_2:			; X32-NEXT: .LBB14_2:
	; X32-NEXT: leal -1(%edx,%edx), %edx			; X32-NEXT: leal -1(%ebx,%ebx), %esi
	; X32-NEXT: movl %ecx, %edi
	; X32-NEXT: jge .LBB14_4			; X32-NEXT: jge .LBB14_4
	; X32-NEXT: # %bb.3:			; X32-NEXT: # %bb.3:
	; X32-NEXT: movl %eax, %edi			; X32-NEXT: movl %edx, %eax
	; X32-NEXT: .LBB14_4:			; X32-NEXT: .LBB14_4:
	; X32-NEXT: subl %esi, %edi			; X32-NEXT: subl %edi, %eax
	; X32-NEXT: movzwl %di, %eax			; X32-NEXT: movzwl %ax, %eax
	; X32-NEXT: shrl %eax			; X32-NEXT: shrl %eax
	; X32-NEXT: imull %edx, %eax			; X32-NEXT: imull %esi, %eax
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: # kill: def $ax killed $ax killed $eax			; X32-NEXT: # kill: def $ax killed $ax killed $eax
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
				; X32-NEXT: popl %ebx
	; X32-NEXT: retl			; X32-NEXT: retl
	%a1 = load i16, i16* %a1_addr			%a1 = load i16, i16* %a1_addr
	%a2 = load i16, i16* %a2_addr			%a2 = load i16, i16* %a2_addr
	%t3 = icmp sgt i16 %a1, %a2 ; signed			%t3 = icmp sgt i16 %a1, %a2 ; signed
	%t4 = select i1 %t3, i16 -1, i16 1			%t4 = select i1 %t3, i16 -1, i16 1
	%t5 = select i1 %t3, i16 %a2, i16 %a1			%t5 = select i1 %t3, i16 %a2, i16 %a1
	%t6 = select i1 %t3, i16 %a1, i16 %a2			%t6 = select i1 %t3, i16 %a1, i16 %a2
	%t7 = sub i16 %t6, %t5			%t7 = sub i16 %t6, %t5
	▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

test/CodeGen/X86/select.ll

	Show First 20 Lines • Show All 1,239 Lines • ▼ Show 20 Lines
	; MCU: # %bb.0:			; MCU: # %bb.0:
	; MCU-NEXT: cmpl $32767, %eax # imm = 0x7FFF			; MCU-NEXT: cmpl $32767, %eax # imm = 0x7FFF
	; MCU-NEXT: movl $32767, %ecx # imm = 0x7FFF			; MCU-NEXT: movl $32767, %ecx # imm = 0x7FFF
	; MCU-NEXT: jg .LBB22_2			; MCU-NEXT: jg .LBB22_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: movl %eax, %ecx			; MCU-NEXT: movl %eax, %ecx
	; MCU-NEXT: .LBB22_2:			; MCU-NEXT: .LBB22_2:
	; MCU-NEXT: cmpl $-32768, %ecx # imm = 0x8000			; MCU-NEXT: cmpl $-32768, %ecx # imm = 0x8000
	; MCU-NEXT: movl $32768, %eax # imm = 0x8000			; MCU-NEXT: movw $-32768, %ax # imm = 0x8000
	; MCU-NEXT: jl .LBB22_4			; MCU-NEXT: jl .LBB22_4
	; MCU-NEXT: # %bb.3:			; MCU-NEXT: # %bb.3:
	; MCU-NEXT: movl %ecx, %eax			; MCU-NEXT: movl %ecx, %eax
	; MCU-NEXT: .LBB22_4:			; MCU-NEXT: .LBB22_4:
	; MCU-NEXT: movw %ax, (%edx)			; MCU-NEXT: movw %ax, (%edx)
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%cmp = icmp sgt i32 %src, 32767			%cmp = icmp sgt i32 %src, 32767
	%sel1 = select i1 %cmp, i32 32767, i32 %src			%sel1 = select i1 %cmp, i32 32767, i32 %src
	▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

X86TargetLowering::LowerSELECT(): don't promote CMOV's if the subtarget does't have themAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 189407

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/fshl.ll

test/CodeGen/X86/fshr.ll

test/CodeGen/X86/midpoint-int.ll

test/CodeGen/X86/select.ll

X86TargetLowering::LowerSELECT(): don't promote CMOV's if the subtarget does't have them
AbandonedPublic