This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
and-sink.ll
-
and-su.ll
-
avx512-cmp.ll
-
btq.ll
-
select.ll
-
test-shrink.ll
2
testb-je-fusion.ll
-
use-add-flags.ll
-
vastart-defs-eflags.ll

Differential D42615

[X86] Generate BT instrutions a bit more agressively
AbandonedPublic

Authored by deadalnix on Jan 27 2018, 11:34 AM.

Download Raw Diff

Details

Reviewers

spatel
hfinkel
niravd
craig.topper
nhaehnle

Summary

Right now, BT isntruction are inly generated when the constant cannot be materialized for a test instruction, and when the source is a (srl (and X, 1), N). This is fragile as any transform that get rid of the srl cause bt to not be materialized anymore. For instance anything (and X, 1 << N) will nto see the bt instruction used.

Because there are numerous pattern that match (and X, 1), bt is only generated when the bit position is greater than 0.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 14303
Build 14303: arc lint + arc unit

Event Timeline

deadalnix created this revision.Jan 27 2018, 11:34 AM

Harbormaster completed remote builds in B14303: Diff 131688.Jan 27 2018, 11:34 AM

deadalnix mentioned this in D41235: [DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner..Jan 27 2018, 11:35 AM

BT has lower throughput than and/test and can't be macrofused. This probably isn't a win. See D37418 for some more discussion on this.

craig.topper added inline comments.Jan 27 2018, 11:45 AM

test/CodeGen/X86/testb-je-fusion.ll
11	I will say this is kinda stupid that we forced a copy just so we could do a high register trick.

In D42615#989870, @craig.topper wrote:

BT has lower throughput than and/test and can't be macrofused. This probably isn't a win. See D37418 for some more discussion on this.

What might be of use is investigating whether this can be instead performed in the MC like @avt77 is trying to do for SHLD/SHRD on D40602.

@craig.topper I had no idea bt has lower throughput than test, I assumed it was the same. If that's the case, then this approach doesn't make much sense.

We should try to get rid of the high register trick that creates extra copies. I'm not sure what is creating this, any suggestions?

test/CodeGen/X86/testb-je-fusion.ll
11	Do you have nay idea what can be causing this? There are numerous test cases where this happens.

The h-register trick is coming from here in X86ISelDAGToDAG.cpp

// For example, "testl %eax, $2048" to "testb %ah, $8".
if (isShiftedUInt<8, 8>(Mask) &&
    (!(Mask & 0x8000) || hasNoSignedComparisonUses(Node))) {
  // Shift the immediate right by 8 bits.
  SDValue ShiftedImm = CurDAG->getTargetConstant(Mask >> 8, dl, MVT::i8);
  SDValue Reg = N0.getOperand(0);

  // Extract the h-register.
  SDValue Subreg = CurDAG->getTargetExtractSubreg(X86::sub_8bit_hi, dl,
                                                  MVT::i8, Reg);

  // Emit a testb.  The EXTRACT_SUBREG becomes a COPY that can only
  // target GR8_NOREX registers, so make sure the register class is
  // forced.
  SDNode *NewNode = CurDAG->getMachineNode(X86::TEST8ri_NOREX, dl,
                                           MVT::i32, Subreg, ShiftedImm);
  // Replace SUB|CMP with TEST, since SUB has two outputs while TEST has
  // one, do not call ReplaceAllUsesWith.
  ReplaceUses(SDValue(Node, (Opcode == X86ISD::SUB ? 1 : 0)),
              SDValue(NewNode, 0));
  CurDAG->RemoveDeadNode(Node);
  return;
}

Looks like this end up being a problem when the value is in EDI due to calling convention. So a few question come to mind:
1/ Shouldn't this optimization be done only after register allocation, if the selected register allows for it ? This would cause it to fail once in a while because the register allocator do not chose the proper register, but it's probably preferable to the extra copies.
2/ Is that possible to hint the register allocator that something is desirable ? For instance, that we would like this value to be in a GR8_NOREX register, but if that's not the case, don't create a copy for it ?

On a second look, when disabling this trick, I only get improvement in various test cases. I'm not sure what's the impact in real source code is, but I'm not convinced this trick is worth doing at all at this stage.

nhaehnle resigned from this revision.Jan 29 2018, 6:53 AM

Ok sounds like this isn't the right approach, closing this one.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

9 lines

test/

CodeGen/

X86/

8 lines

4 lines

2 lines

4 lines

2 lines

36 lines

5 lines

8 lines

vastart-defs-eflags.ll

11 lines

Diff 131688

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,599 Lines • ▼ Show 20 Lines	static SDValue LowerAndToBT(SDValue And, ISD::CondCode CC,
} else if (Op1.getOpcode() == ISD::Constant) {		} else if (Op1.getOpcode() == ISD::Constant) {
ConstantSDNode *AndRHS = cast<ConstantSDNode>(Op1);		ConstantSDNode *AndRHS = cast<ConstantSDNode>(Op1);
uint64_t AndRHSVal = AndRHS->getZExtValue();		uint64_t AndRHSVal = AndRHS->getZExtValue();
SDValue AndLHS = Op0;		SDValue AndLHS = Op0;

if (AndRHSVal == 1 && AndLHS.getOpcode() == ISD::SRL) {		if (AndRHSVal == 1 && AndLHS.getOpcode() == ISD::SRL) {
LHS = AndLHS.getOperand(0);		LHS = AndLHS.getOperand(0);
RHS = AndLHS.getOperand(1);		RHS = AndLHS.getOperand(1);
}		} else if (AndRHSVal > 1 && isPowerOf2_64(AndRHSVal) &&
		(!isUInt<32>(AndRHSVal) \|\| AndLHS.getValueType() != MVT::i8)) {
// Use BT if the immediate can't be encoded in a TEST instruction.		// We transform iff if the immediate can't be encoded in a TEST
if (!isUInt<32>(AndRHSVal) && isPowerOf2_64(AndRHSVal)) {		// instruction or if the BT instruction do not require the addition
		// of an any_extend.
LHS = AndLHS;		LHS = AndLHS;
RHS = DAG.getConstant(Log2_64_Ceil(AndRHSVal), dl, LHS.getValueType());		RHS = DAG.getConstant(Log2_64_Ceil(AndRHSVal), dl, LHS.getValueType());
}		}
}		}

if (LHS.getNode())		if (LHS.getNode())
return getBitTestCondition(LHS, RHS, CC, dl, DAG);		return getBitTestCondition(LHS, RHS, CC, dl, DAG);

▲ Show 20 Lines • Show All 21,411 Lines • Show Last 20 Lines

test/CodeGen/X86/and-sink.ll

	Show All 9 Lines
	define i32 @and_sink1(i32 %a, i1 %c) {			define i32 @and_sink1(i32 %a, i1 %c) {
	; CHECK-LABEL: and_sink1:			; CHECK-LABEL: and_sink1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: testb $1, {{[0-9]+}}(%esp)			; CHECK-NEXT: testb $1, {{[0-9]+}}(%esp)
	; CHECK-NEXT: je .LBB0_3			; CHECK-NEXT: je .LBB0_3
	; CHECK-NEXT: # %bb.1: # %bb0			; CHECK-NEXT: # %bb.1: # %bb0
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl $0, A			; CHECK-NEXT: movl $0, A
	; CHECK-NEXT: testb $4, %al			; CHECK-NEXT: btl $2, %eax
	; CHECK-NEXT: jne .LBB0_3			; CHECK-NEXT: jb .LBB0_3
	; CHECK-NEXT: # %bb.2: # %bb1			; CHECK-NEXT: # %bb.2: # %bb1
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	; CHECK-NEXT: .LBB0_3: # %bb2			; CHECK-NEXT: .LBB0_3: # %bb2
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl

	; CHECK-CGP-LABEL: @and_sink1(			; CHECK-CGP-LABEL: @and_sink1(
	Show All 29 Lines
	; CHECK-NEXT: .LBB1_2: # %bb0			; CHECK-NEXT: .LBB1_2: # %bb0
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: movl $0, B			; CHECK-NEXT: movl $0, B
	; CHECK-NEXT: testb $1, %al			; CHECK-NEXT: testb $1, %al
	; CHECK-NEXT: je .LBB1_5			; CHECK-NEXT: je .LBB1_5
	; CHECK-NEXT: # %bb.3: # %bb1			; CHECK-NEXT: # %bb.3: # %bb1
	; CHECK-NEXT: # in Loop: Header=BB1_2 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_2 Depth=1
	; CHECK-NEXT: movl $0, C			; CHECK-NEXT: movl $0, C
	; CHECK-NEXT: testb $4, %cl			; CHECK-NEXT: btl $2, %ecx
	; CHECK-NEXT: jne .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	; CHECK-NEXT: # %bb.4: # %bb2			; CHECK-NEXT: # %bb.4: # %bb2
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	; CHECK-NEXT: .LBB1_5: # %bb3			; CHECK-NEXT: .LBB1_5: # %bb3
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl

	; CHECK-CGP-LABEL: @and_sink2(			; CHECK-CGP-LABEL: @and_sink2(
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

test/CodeGen/X86/and-su.ll

	Show All 36 Lines
	; CHECK-NEXT: cmpl $8, %eax			; CHECK-NEXT: cmpl $8, %eax
	; CHECK-NEXT: jb .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	; CHECK-NEXT: # %bb.1: # %bb10			; CHECK-NEXT: # %bb.1: # %bb10
	; CHECK-NEXT: testb $1, %cl			; CHECK-NEXT: testb $1, %cl
	; CHECK-NEXT: je .LBB1_3			; CHECK-NEXT: je .LBB1_3
	; CHECK-NEXT: .LBB1_2: # %bb11			; CHECK-NEXT: .LBB1_2: # %bb11
	; CHECK-NEXT: fchs			; CHECK-NEXT: fchs
	; CHECK-NEXT: .LBB1_3: # %bb13			; CHECK-NEXT: .LBB1_3: # %bb13
	; CHECK-NEXT: testb $2, %cl			; CHECK-NEXT: btl $1, %ecx
	; CHECK-NEXT: je .LBB1_5			; CHECK-NEXT: jae .LBB1_5
	; CHECK-NEXT: # %bb.4: # %bb14			; CHECK-NEXT: # %bb.4: # %bb14
	; CHECK-NEXT: fxch %st(1)			; CHECK-NEXT: fxch %st(1)
	; CHECK-NEXT: fchs			; CHECK-NEXT: fchs
	; CHECK-NEXT: fxch %st(1)			; CHECK-NEXT: fxch %st(1)
	; CHECK-NEXT: .LBB1_5: # %bb16			; CHECK-NEXT: .LBB1_5: # %bb16
	; CHECK-NEXT: faddp %st(1)			; CHECK-NEXT: faddp %st(1)
	; CHECK-NEXT: movl %ebp, %esp			; CHECK-NEXT: movl %ebp, %esp
	; CHECK-NEXT: popl %ebp			; CHECK-NEXT: popl %ebp
	Show All 30 Lines

test/CodeGen/X86/avx512-cmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f \| FileCheck %s --check-prefix=ALL --check-prefix=KNL			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f \| FileCheck %s --check-prefix=ALL --check-prefix=KNL
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq \| FileCheck %s --check-prefix=ALL --check-prefix=SKX			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq \| FileCheck %s --check-prefix=ALL --check-prefix=SKX

	define double @test1(double %a, double %b) nounwind {			define double @test1(double %a, double %b) nounwind {
	; ALL-LABEL: test1:			; ALL-LABEL: test1:
	; ALL: ## %bb.0:			; ALL: ## %bb.0:
	; ALL-NEXT: vucomisd %xmm1, %xmm0			; ALL-NEXT: vucomisd %xmm1, %xmm0
	; ALL-NEXT: jne LBB0_1			; ALL-NEXT: jne LBB0_1
	; ALL-NEXT: jnp LBB0_2			; ALL-NEXT: jnp LBB0_2
	; ALL-NEXT: LBB0_1: ## %l1			; ALL-NEXT: LBB0_1: ## %l1
	; ALL-NEXT: vsubsd %xmm1, %xmm0, %xmm0			; ALL-NEXT: vsubsd %xmm1, %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	; ALL-NEXT: LBB0_2: ## %l2			; ALL-NEXT: LBB0_2: ## %l2
	; ALL-NEXT: vaddsd %xmm1, %xmm0, %xmm0			; ALL-NEXT: vaddsd %xmm1, %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	; ALL-NEXT: ## -- End function
	%tobool = fcmp une double %a, %b			%tobool = fcmp une double %a, %b
	br i1 %tobool, label %l1, label %l2			br i1 %tobool, label %l1, label %l2

	l1:			l1:
	%c = fsub double %a, %b			%c = fsub double %a, %b
	ret double %c			ret double %c
	l2:			l2:
	%c1 = fadd double %a, %b			%c1 = fadd double %a, %b
	ret double %c1			ret double %c1
	}			}

	define float @test2(float %a, float %b) nounwind {			define float @test2(float %a, float %b) nounwind {
	; ALL-LABEL: test2:			; ALL-LABEL: test2:
	; ALL: ## %bb.0:			; ALL: ## %bb.0:
	; ALL-NEXT: vucomiss %xmm0, %xmm1			; ALL-NEXT: vucomiss %xmm0, %xmm1
	; ALL-NEXT: jbe LBB1_2			; ALL-NEXT: jbe LBB1_2
	; ALL-NEXT: ## %bb.1: ## %l1			; ALL-NEXT: ## %bb.1: ## %l1
	; ALL-NEXT: vsubss %xmm1, %xmm0, %xmm0			; ALL-NEXT: vsubss %xmm1, %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	; ALL-NEXT: LBB1_2: ## %l2			; ALL-NEXT: LBB1_2: ## %l2
	; ALL-NEXT: vaddss %xmm1, %xmm0, %xmm0			; ALL-NEXT: vaddss %xmm1, %xmm0, %xmm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	; ALL-NEXT: ## -- End function
	%tobool = fcmp olt float %a, %b			%tobool = fcmp olt float %a, %b
	br i1 %tobool, label %l1, label %l2			br i1 %tobool, label %l1, label %l2

	l1:			l1:
	%c = fsub float %a, %b			%c = fsub float %a, %b
	ret float %c			ret float %c
	l2:			l2:
	%c1 = fadd float %a, %b			%c1 = fadd float %a, %b
	▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

test/CodeGen/X86/btq.ll

	Show All 21 Lines

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test2(i64 %foo) nounwind {			define void @test2(i64 %foo) nounwind {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: testl $-2147483648, %edi # imm = 0x80000000			; CHECK-NEXT: btl $31, %edi
	; CHECK-NEXT: jne .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	; CHECK-NEXT: # %bb.1: # %if.end			; CHECK-NEXT: # %bb.1: # %if.end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	; CHECK-NEXT: .LBB1_2: # %if.then			; CHECK-NEXT: .LBB1_2: # %if.then
	; CHECK-NEXT: jmp bar # TAILCALL			; CHECK-NEXT: jmp bar # TAILCALL
	%and = and i64 %foo, 2147483648			%and = and i64 %foo, 2147483648
	%tobool = icmp eq i64 %and, 0			%tobool = icmp eq i64 %and, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then

	if.then:			if.then:
	tail call void @bar() nounwind			tail call void @bar() nounwind
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

test/CodeGen/X86/select.ll

	Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines
	; MCU-NEXT: movl %eax, %ebp			; MCU-NEXT: movl %eax, %ebp
	; MCU-NEXT: movl $4, %ecx			; MCU-NEXT: movl $4, %ecx
	; MCU-NEXT: mull %ecx			; MCU-NEXT: mull %ecx
	; MCU-NEXT: movl %eax, %esi			; MCU-NEXT: movl %eax, %esi
	; MCU-NEXT: leal (%edx,%ebx,4), %edi			; MCU-NEXT: leal (%edx,%ebx,4), %edi
	; MCU-NEXT: movl %edi, %edx			; MCU-NEXT: movl %edi, %edx
	; MCU-NEXT: pushl $0			; MCU-NEXT: pushl $0
	; MCU-NEXT: pushl $4			; MCU-NEXT: pushl $4
	; MCU-NEXT: calll __udivdi3			; MCU-NEXT: calll __udivdi3@PLT
	; MCU-NEXT: addl $8, %esp			; MCU-NEXT: addl $8, %esp
	; MCU-NEXT: xorl %ebx, %edx			; MCU-NEXT: xorl %ebx, %edx
	; MCU-NEXT: xorl %ebp, %eax			; MCU-NEXT: xorl %ebp, %eax
	; MCU-NEXT: orl %edx, %eax			; MCU-NEXT: orl %edx, %eax
	; MCU-NEXT: movl $-1, %eax			; MCU-NEXT: movl $-1, %eax
	; MCU-NEXT: movl $-1, %edx			; MCU-NEXT: movl $-1, %edx
	; MCU-NEXT: jne .LBB14_2			; MCU-NEXT: jne .LBB14_2
	; MCU-NEXT: # %bb.1: # %entry			; MCU-NEXT: # %bb.1: # %entry
	▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

test/CodeGen/X86/test-shrink.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	yes:
ret void		ret void
no:		no:
ret void		ret void
}		}

define void @g64xl(i64 inreg %x) nounwind {		define void @g64xl(i64 inreg %x) nounwind {
; CHECK-LINUX64-LABEL: g64xl:		; CHECK-LINUX64-LABEL: g64xl:
; CHECK-LINUX64: # %bb.0:		; CHECK-LINUX64: # %bb.0:
; CHECK-LINUX64-NEXT: testb $8, %dil		; CHECK-LINUX64-NEXT: btl $3, %edi
; CHECK-LINUX64-NEXT: jne .LBB1_2		; CHECK-LINUX64-NEXT: jb .LBB1_2
; CHECK-LINUX64-NEXT: # %bb.1: # %yes		; CHECK-LINUX64-NEXT: # %bb.1: # %yes
; CHECK-LINUX64-NEXT: pushq %rax		; CHECK-LINUX64-NEXT: pushq %rax
; CHECK-LINUX64-NEXT: callq bar		; CHECK-LINUX64-NEXT: callq bar
; CHECK-LINUX64-NEXT: popq %rax		; CHECK-LINUX64-NEXT: popq %rax
; CHECK-LINUX64-NEXT: .LBB1_2: # %no		; CHECK-LINUX64-NEXT: .LBB1_2: # %no
; CHECK-LINUX64-NEXT: retq		; CHECK-LINUX64-NEXT: retq
;		;
; CHECK-WIN32-64-LABEL: g64xl:		; CHECK-WIN32-64-LABEL: g64xl:
; CHECK-WIN32-64: # %bb.0:		; CHECK-WIN32-64: # %bb.0:
; CHECK-WIN32-64-NEXT: subq $40, %rsp		; CHECK-WIN32-64-NEXT: subq $40, %rsp
; CHECK-WIN32-64-NEXT: testb $8, %cl		; CHECK-WIN32-64-NEXT: btl $3, %ecx
; CHECK-WIN32-64-NEXT: jne .LBB1_2		; CHECK-WIN32-64-NEXT: jb .LBB1_2
; CHECK-WIN32-64-NEXT: # %bb.1: # %yes		; CHECK-WIN32-64-NEXT: # %bb.1: # %yes
; CHECK-WIN32-64-NEXT: callq bar		; CHECK-WIN32-64-NEXT: callq bar
; CHECK-WIN32-64-NEXT: .LBB1_2: # %no		; CHECK-WIN32-64-NEXT: .LBB1_2: # %no
; CHECK-WIN32-64-NEXT: addq $40, %rsp		; CHECK-WIN32-64-NEXT: addq $40, %rsp
; CHECK-WIN32-64-NEXT: retq		; CHECK-WIN32-64-NEXT: retq
;		;
; CHECK-X86-LABEL: g64xl:		; CHECK-X86-LABEL: g64xl:
; CHECK-X86: # %bb.0:		; CHECK-X86: # %bb.0:
; CHECK-X86-NEXT: testb $8, %al		; CHECK-X86-NEXT: btl $3, %eax
; CHECK-X86-NEXT: jne .LBB1_2		; CHECK-X86-NEXT: jb .LBB1_2
; CHECK-X86-NEXT: # %bb.1: # %yes		; CHECK-X86-NEXT: # %bb.1: # %yes
; CHECK-X86-NEXT: calll bar		; CHECK-X86-NEXT: calll bar
; CHECK-X86-NEXT: .LBB1_2: # %no		; CHECK-X86-NEXT: .LBB1_2: # %no
; CHECK-X86-NEXT: retl		; CHECK-X86-NEXT: retl
%t = and i64 %x, 8		%t = and i64 %x, 8
%s = icmp eq i64 %t, 0		%s = icmp eq i64 %t, 0
br i1 %s, label %yes, label %no		br i1 %s, label %yes, label %no

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	yes:
ret void		ret void
no:		no:
ret void		ret void
}		}

define void @g32xl(i32 inreg %x) nounwind {		define void @g32xl(i32 inreg %x) nounwind {
; CHECK-LINUX64-LABEL: g32xl:		; CHECK-LINUX64-LABEL: g32xl:
; CHECK-LINUX64: # %bb.0:		; CHECK-LINUX64: # %bb.0:
; CHECK-LINUX64-NEXT: testb $8, %dil		; CHECK-LINUX64-NEXT: btl $3, %edi
; CHECK-LINUX64-NEXT: jne .LBB3_2		; CHECK-LINUX64-NEXT: jb .LBB3_2
; CHECK-LINUX64-NEXT: # %bb.1: # %yes		; CHECK-LINUX64-NEXT: # %bb.1: # %yes
; CHECK-LINUX64-NEXT: pushq %rax		; CHECK-LINUX64-NEXT: pushq %rax
; CHECK-LINUX64-NEXT: callq bar		; CHECK-LINUX64-NEXT: callq bar
; CHECK-LINUX64-NEXT: popq %rax		; CHECK-LINUX64-NEXT: popq %rax
; CHECK-LINUX64-NEXT: .LBB3_2: # %no		; CHECK-LINUX64-NEXT: .LBB3_2: # %no
; CHECK-LINUX64-NEXT: retq		; CHECK-LINUX64-NEXT: retq
;		;
; CHECK-WIN32-64-LABEL: g32xl:		; CHECK-WIN32-64-LABEL: g32xl:
; CHECK-WIN32-64: # %bb.0:		; CHECK-WIN32-64: # %bb.0:
; CHECK-WIN32-64-NEXT: subq $40, %rsp		; CHECK-WIN32-64-NEXT: subq $40, %rsp
; CHECK-WIN32-64-NEXT: testb $8, %cl		; CHECK-WIN32-64-NEXT: btl $3, %ecx
; CHECK-WIN32-64-NEXT: jne .LBB3_2		; CHECK-WIN32-64-NEXT: jb .LBB3_2
; CHECK-WIN32-64-NEXT: # %bb.1: # %yes		; CHECK-WIN32-64-NEXT: # %bb.1: # %yes
; CHECK-WIN32-64-NEXT: callq bar		; CHECK-WIN32-64-NEXT: callq bar
; CHECK-WIN32-64-NEXT: .LBB3_2: # %no		; CHECK-WIN32-64-NEXT: .LBB3_2: # %no
; CHECK-WIN32-64-NEXT: addq $40, %rsp		; CHECK-WIN32-64-NEXT: addq $40, %rsp
; CHECK-WIN32-64-NEXT: retq		; CHECK-WIN32-64-NEXT: retq
;		;
; CHECK-X86-LABEL: g32xl:		; CHECK-X86-LABEL: g32xl:
; CHECK-X86: # %bb.0:		; CHECK-X86: # %bb.0:
; CHECK-X86-NEXT: testb $8, %al		; CHECK-X86-NEXT: btl $3, %eax
; CHECK-X86-NEXT: jne .LBB3_2		; CHECK-X86-NEXT: jb .LBB3_2
; CHECK-X86-NEXT: # %bb.1: # %yes		; CHECK-X86-NEXT: # %bb.1: # %yes
; CHECK-X86-NEXT: calll bar		; CHECK-X86-NEXT: calll bar
; CHECK-X86-NEXT: .LBB3_2: # %no		; CHECK-X86-NEXT: .LBB3_2: # %no
; CHECK-X86-NEXT: retl		; CHECK-X86-NEXT: retl
%t = and i32 %x, 8		%t = and i32 %x, 8
%s = icmp eq i32 %t, 0		%s = icmp eq i32 %t, 0
br i1 %s, label %yes, label %no		br i1 %s, label %yes, label %no

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	yes:
ret void		ret void
no:		no:
ret void		ret void
}		}

define void @g16xl(i16 inreg %x) nounwind {		define void @g16xl(i16 inreg %x) nounwind {
; CHECK-LINUX64-LABEL: g16xl:		; CHECK-LINUX64-LABEL: g16xl:
; CHECK-LINUX64: # %bb.0:		; CHECK-LINUX64: # %bb.0:
; CHECK-LINUX64-NEXT: testb $8, %dil		; CHECK-LINUX64-NEXT: btl $3, %edi
; CHECK-LINUX64-NEXT: jne .LBB5_2		; CHECK-LINUX64-NEXT: jb .LBB5_2
; CHECK-LINUX64-NEXT: # %bb.1: # %yes		; CHECK-LINUX64-NEXT: # %bb.1: # %yes
; CHECK-LINUX64-NEXT: pushq %rax		; CHECK-LINUX64-NEXT: pushq %rax
; CHECK-LINUX64-NEXT: callq bar		; CHECK-LINUX64-NEXT: callq bar
; CHECK-LINUX64-NEXT: popq %rax		; CHECK-LINUX64-NEXT: popq %rax
; CHECK-LINUX64-NEXT: .LBB5_2: # %no		; CHECK-LINUX64-NEXT: .LBB5_2: # %no
; CHECK-LINUX64-NEXT: retq		; CHECK-LINUX64-NEXT: retq
;		;
; CHECK-WIN32-64-LABEL: g16xl:		; CHECK-WIN32-64-LABEL: g16xl:
; CHECK-WIN32-64: # %bb.0:		; CHECK-WIN32-64: # %bb.0:
; CHECK-WIN32-64-NEXT: subq $40, %rsp		; CHECK-WIN32-64-NEXT: subq $40, %rsp
; CHECK-WIN32-64-NEXT: testb $8, %cl		; CHECK-WIN32-64-NEXT: btl $3, %ecx
; CHECK-WIN32-64-NEXT: jne .LBB5_2		; CHECK-WIN32-64-NEXT: jb .LBB5_2
; CHECK-WIN32-64-NEXT: # %bb.1: # %yes		; CHECK-WIN32-64-NEXT: # %bb.1: # %yes
; CHECK-WIN32-64-NEXT: callq bar		; CHECK-WIN32-64-NEXT: callq bar
; CHECK-WIN32-64-NEXT: .LBB5_2: # %no		; CHECK-WIN32-64-NEXT: .LBB5_2: # %no
; CHECK-WIN32-64-NEXT: addq $40, %rsp		; CHECK-WIN32-64-NEXT: addq $40, %rsp
; CHECK-WIN32-64-NEXT: retq		; CHECK-WIN32-64-NEXT: retq
;		;
; CHECK-X86-LABEL: g16xl:		; CHECK-X86-LABEL: g16xl:
; CHECK-X86: # %bb.0:		; CHECK-X86: # %bb.0:
; CHECK-X86-NEXT: testb $8, %al		; CHECK-X86-NEXT: btl $3, %eax
; CHECK-X86-NEXT: jne .LBB5_2		; CHECK-X86-NEXT: jb .LBB5_2
; CHECK-X86-NEXT: # %bb.1: # %yes		; CHECK-X86-NEXT: # %bb.1: # %yes
; CHECK-X86-NEXT: calll bar		; CHECK-X86-NEXT: calll bar
; CHECK-X86-NEXT: .LBB5_2: # %no		; CHECK-X86-NEXT: .LBB5_2: # %no
; CHECK-X86-NEXT: retl		; CHECK-X86-NEXT: retl
%t = and i16 %x, 8		%t = and i16 %x, 8
%s = icmp eq i16 %t, 0		%s = icmp eq i16 %t, 0
br i1 %s, label %yes, label %no		br i1 %s, label %yes, label %no

▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

test/CodeGen/X86/testb-je-fusion.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- -mcpu=corei7-avx \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-- -mcpu=corei7-avx \| FileCheck %s

	; testb should be scheduled right before je to enable macro-fusion.			; testb should be scheduled right before je to enable macro-fusion.

	define i32 @check_flag(i32 %flags, ...) nounwind {			define i32 @check_flag(i32 %flags, ...) nounwind {
	; CHECK-LABEL: check_flag:			; CHECK-LABEL: check_flag:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movl %edi, %ecx
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testb $2, %ch			; CHECK-NEXT: btl $9, %edi
	craig.topperUnsubmitted Not Done Reply Inline Actions I will say this is kinda stupid that we forced a copy just so we could do a high register trick. craig.topper: I will say this is kinda stupid that we forced a copy just so we could do a high register trick.
	deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Do you have nay idea what can be causing this? There are numerous test cases where this happens. deadalnix: Do you have nay idea what can be causing this? There are numerous test cases where this happens.
	; CHECK-NEXT: je .LBB0_2			; CHECK-NEXT: jae .LBB0_2
	; CHECK-NEXT: # %bb.1: # %if.then			; CHECK-NEXT: # %bb.1: # %if.then
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: .LBB0_2: # %if.end			; CHECK-NEXT: .LBB0_2: # %if.end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%and = and i32 %flags, 512			%and = and i32 %flags, 512
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then

	if.then:			if.then:
	br label %if.end			br label %if.end

	if.end:			if.end:
	%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]			%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]
	ret i32 %hasflag			ret i32 %hasflag
	}			}

test/CodeGen/X86/use-add-flags.ll

	Show All 30 Lines
	declare void @foo(i32)			declare void @foo(i32)

	; Don't use the flags result of the and here, since the and has no			; Don't use the flags result of the and here, since the and has no
	; other use. A simple test is better.			; other use. A simple test is better.

	define void @test2(i32 %x) nounwind {			define void @test2(i32 %x) nounwind {
	; LNX-LABEL: test2:			; LNX-LABEL: test2:
	; LNX: # %bb.0:			; LNX: # %bb.0:
	; LNX-NEXT: testb $16, %dil			; LNX-NEXT: btl $4, %edi
	; LNX-NEXT: jne .LBB1_2			; LNX-NEXT: jb .LBB1_2
	; LNX-NEXT: # %bb.1: # %true			; LNX-NEXT: # %bb.1: # %true
	; LNX-NEXT: pushq %rax			; LNX-NEXT: pushq %rax
	; LNX-NEXT: callq foo			; LNX-NEXT: callq foo
	; LNX-NEXT: popq %rax			; LNX-NEXT: popq %rax
	; LNX-NEXT: .LBB1_2: # %false			; LNX-NEXT: .LBB1_2: # %false
	; LNX-NEXT: retq			; LNX-NEXT: retq
	;			;
	; WIN-LABEL: test2:			; WIN-LABEL: test2:
	; WIN: # %bb.0:			; WIN: # %bb.0:
	; WIN-NEXT: subq $40, %rsp			; WIN-NEXT: subq $40, %rsp
	; WIN-NEXT: testb $16, %cl			; WIN-NEXT: btl $4, %ecx
	; WIN-NEXT: jne .LBB1_2			; WIN-NEXT: jb .LBB1_2
	; WIN-NEXT: # %bb.1: # %true			; WIN-NEXT: # %bb.1: # %true
	; WIN-NEXT: callq foo			; WIN-NEXT: callq foo
	; WIN-NEXT: .LBB1_2: # %false			; WIN-NEXT: .LBB1_2: # %false
	; WIN-NEXT: addq $40, %rsp			; WIN-NEXT: addq $40, %rsp
	; WIN-NEXT: retq			; WIN-NEXT: retq
	%y = and i32 %x, 16			%y = and i32 %x, 16
	%t = icmp eq i32 %y, 0			%t = icmp eq i32 %y, 0
	br i1 %t, label %true, label %false			br i1 %t, label %true, label %false
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/X86/vastart-defs-eflags.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc %s -o - \| FileCheck %s			; RUN: llc %s -o - \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.10.0"			target triple = "x86_64-apple-macosx10.10.0"

	; Check that vastart handling doesn't get between testb and je for the branch.			; Check that vastart handling doesn't get between testb and je for the branch.
	define i32 @check_flag(i32 %flags, ...) nounwind {			define i32 @check_flag(i32 %flags, ...) nounwind {
	; CHECK-LABEL: check_flag:			; CHECK-LABEL: check_flag:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: subq $56, %rsp
	; CHECK-NEXT: subq $48, %rsp
	; CHECK-NEXT: movl %edi, %ebx
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: je LBB0_2			; CHECK-NEXT: je LBB0_2
	; CHECK-NEXT: ## %bb.1: ## %entry			; CHECK-NEXT: ## %bb.1: ## %entry
	; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm2, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm3, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm4, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm4, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm5, (%rsp)			; CHECK-NEXT: movaps %xmm5, (%rsp)
	; CHECK-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm6, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm7, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: LBB0_2: ## %entry			; CHECK-NEXT: LBB0_2: ## %entry
	; CHECK-NEXT: movq %r9, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %r9, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %r8, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %r8, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testb $2, %bh			; CHECK-NEXT: btl $9, %edi
	; CHECK-NEXT: je LBB0_4			; CHECK-NEXT: jae LBB0_4
	; CHECK-NEXT: ## %bb.3: ## %if.then			; CHECK-NEXT: ## %bb.3: ## %if.then
	; CHECK-NEXT: leaq -{{[0-9]+}}(%rsp), %rax			; CHECK-NEXT: leaq -{{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq %rax, 16			; CHECK-NEXT: movq %rax, 16
	; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rax			; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq %rax, 8			; CHECK-NEXT: movq %rax, 8
	; CHECK-NEXT: movl $48, 4			; CHECK-NEXT: movl $48, 4
	; CHECK-NEXT: movl $8, 0			; CHECK-NEXT: movl $8, 0
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: LBB0_4: ## %if.end			; CHECK-NEXT: LBB0_4: ## %if.end
	; CHECK-NEXT: addq $48, %rsp			; CHECK-NEXT: addq $56, %rsp
	; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%and = and i32 %flags, 512			%and = and i32 %flags, 512
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	call void @llvm.va_start(i8* null)			call void @llvm.va_start(i8* null)
	br label %if.end			br label %if.end

	if.end: ; preds = %entry, %if.then			if.end: ; preds = %entry, %if.then
	%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]			%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]
	ret i32 %hasflag			ret i32 %hasflag
	}			}

	declare void @llvm.va_start(i8*) nounwind			declare void @llvm.va_start(i8*) nounwind