This is an archive of the discontinued LLVM Phabricator instance.

llvm/trunk/lib/Target/X86/X86CmovConversion.cpp
297	Thanks Chandler for the quick fix. However, we can reduce the restriction to only the case where: We are compiling for 64bit The CMOV destination is 32bit. All other cases has no issue with the CMOV behavior. Do you agree?

chandlerc added inline comments.Sep 6 2017, 1:59 AM

llvm/trunk/lib/Target/X86/X86CmovConversion.cpp
297	Hmm. Probably, because that's where zext is present. However, aren't those the same conditions in which SUBREG_TO_REG will be introduced? If this is an assertion of zext-ing behavior, it can only show up due to there being zext-ing behavior of cmov itself, and that seems to be the same set of restrictions you outline. Not sure how much more time we should spend on enhanceing the dodge of a miscompile vs. the perhaps more interesting work to handle this case elegantly and effectively.

aaboud added inline comments.Sep 6 2017, 2:10 AM

llvm/trunk/lib/Target/X86/X86CmovConversion.cpp
297	However, aren't those the same conditions in which SUBREG_TO_REG will be introduced? If this is an assertion of zext-ing behavior, it can only show up due to there being zext-ing behavior of cmov itself, and that seems to be the same set of restrictions you outline. I think you are right, I was thinking about (CMOV16rr + zextTo32), but in such case the zext will not be removed, and we will not see the SUBREG_TO_REG. So, this restriction is good enough. Not sure how much more time we should spend on enhanceing the dodge of a miscompile vs. the perhaps more interesting work to handle this case elegantly and effectively. Sure, I did not mean that we need to spend any more effort immediately. The top priority is to make sure the pass has no functionality issue, which you did already. Thinking forward, I want to be sure what more we can do. I also prepared a patch that add the MOVrr instruction, as Dave suggested, but I need to run performance measurements before I suggest that direction. For now, I think we are fine with this solution, at least till we see a real performance issue.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86CmovConversion.cpp

10 lines

test/

CodeGen/

X86/

cmov-into-branch.ll

26 lines

Diff 113960

llvm/trunk/lib/Target/X86/X86CmovConversion.cpp

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	for (auto &I : *MBB) {
if (I.mayLoad()) {		if (I.mayLoad()) {
if (MemOpCC == X86::COND_INVALID)		if (MemOpCC == X86::COND_INVALID)
// The first memory operand CMOV.		// The first memory operand CMOV.
MemOpCC = CC;		MemOpCC = CC;
else if (CC != MemOpCC)		else if (CC != MemOpCC)
// Can't handle mixed conditions with memory operands.		// Can't handle mixed conditions with memory operands.
SkipGroup = true;		SkipGroup = true;
}		}
		// Check if we were relying on zero-extending behavior of the CMOV.
		if (!SkipGroup &&
		llvm::any_of(
		MRI->use_nodbg_instructions(I.defs().begin()->getReg()),
		[&](MachineInstr &UseI) {
		return UseI.getOpcode() == X86::SUBREG_TO_REG;
		aaboudUnsubmitted Not Done Reply Inline Actions Thanks Chandler for the quick fix. However, we can reduce the restriction to only the case where: We are compiling for 64bit The CMOV destination is 32bit. All other cases has no issue with the CMOV behavior. Do you agree? aaboud: Thanks Chandler for the quick fix. However, we can reduce the restriction to only the case…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions Hmm. Probably, because that's where zext is present. However, aren't those the same conditions in which SUBREG_TO_REG will be introduced? If this is an assertion of zext-ing behavior, it can only show up due to there being zext-ing behavior of cmov itself, and that seems to be the same set of restrictions you outline. Not sure how much more time we should spend on enhanceing the dodge of a miscompile vs. the perhaps more interesting work to handle this case elegantly and effectively. chandlerc: Hmm. Probably, because that's where zext is present. However, aren't those the same conditions…
		aaboudUnsubmitted Not Done Reply Inline Actions However, aren't those the same conditions in which SUBREG_TO_REG will be introduced? If this is an assertion of zext-ing behavior, it can only show up due to there being zext-ing behavior of cmov itself, and that seems to be the same set of restrictions you outline. I think you are right, I was thinking about (CMOV16rr + zextTo32), but in such case the zext will not be removed, and we will not see the SUBREG_TO_REG. So, this restriction is good enough. Not sure how much more time we should spend on enhanceing the dodge of a miscompile vs. the perhaps more interesting work to handle this case elegantly and effectively. Sure, I did not mean that we need to spend any more effort immediately. The top priority is to make sure the pass has no functionality issue, which you did already. Thinking forward, I want to be sure what more we can do. I also prepared a patch that add the MOVrr instruction, as Dave suggested, but I need to run performance measurements before I suggest that direction. For now, I think we are fine with this solution, at least till we see a real performance issue. aaboud: > However, aren't those the same conditions in which SUBREG_TO_REG will be introduced? If this…
		}))
		// FIXME: We should model the cost of using an explicit MOV to handle
		// the zero-extension rather than just refusing to handle this.
		SkipGroup = true;
continue;		continue;
}		}
// If Group is empty, keep looking for first CMOV in the range.		// If Group is empty, keep looking for first CMOV in the range.
if (Group.empty())		if (Group.empty())
continue;		continue;

// We found a non X86::CMOVrr instruction.		// We found a non X86::CMOVrr instruction.
FoundNonCMOVInst = true;		FoundNonCMOVInst = true;
▲ Show 20 Lines • Show All 493 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cmov-into-branch.ll

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%load = load i32, i32* %b, align 4		%load = load i32, i32* %b, align 4
%cmp = icmp ult i32 %load, %a		%cmp = icmp ult i32 %load, %a
%cmp1 = icmp ugt i32 %load, %a		%cmp1 = icmp ugt i32 %load, %a
%cond = select i1 %cmp1, i32 %a, i32 %y		%cond = select i1 %cmp1, i32 %a, i32 %y
%cond5 = select i1 %cmp, i32 %cond, i32 %x		%cond5 = select i1 %cmp, i32 %cond, i32 %x
ret i32 %cond5		ret i32 %cond5
}		}

		; Zero-extended select.
		define void @test6(i32 %a, i32 %x, i32* %y.ptr, i64* %z.ptr) {
		; CHECK-LABEL: test6:
		; CHECK: # BB#0: # %entry
		; CHECK-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
		; CHECK-NEXT: testl %edi, %edi
		; CHECK-NEXT: cmovnsl (%rdx), %esi
		; CHECK-NEXT: movq %rsi, (%rcx)
		; CHECK-NEXT: retq
		entry:
		%y = load i32, i32* %y.ptr
		%cmp = icmp slt i32 %a, 0
		%z = select i1 %cmp, i32 %x, i32 %y
		%z.ext = zext i32 %z to i64
		store i64 %z.ext, i64* %z.ptr
		ret void
		}

; If a select is not obviously predictable, don't turn it into a branch.		; If a select is not obviously predictable, don't turn it into a branch.
define i32 @weighted_select1(i32 %a, i32 %b) {		define i32 @weighted_select1(i32 %a, i32 %b) {
; CHECK-LABEL: weighted_select1:		; CHECK-LABEL: weighted_select1:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: testl %edi, %edi		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: cmovnel %edi, %esi		; CHECK-NEXT: cmovnel %edi, %esi
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %esi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%cmp = icmp ne i32 %a, 0		%cmp = icmp ne i32 %a, 0
%sel = select i1 %cmp, i32 %a, i32 %b, !prof !0		%sel = select i1 %cmp, i32 %a, i32 %b, !prof !0
ret i32 %sel		ret i32 %sel
}		}

; If a select is obviously predictable, turn it into a branch.		; If a select is obviously predictable, turn it into a branch.
define i32 @weighted_select2(i32 %a, i32 %b) {		define i32 @weighted_select2(i32 %a, i32 %b) {
; CHECK-LABEL: weighted_select2:		; CHECK-LABEL: weighted_select2:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: testl %edi, %edi		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: jne .LBB5_2		; CHECK-NEXT: jne .LBB6_2
; CHECK-NEXT: # BB#1: # %select.false		; CHECK-NEXT: # BB#1: # %select.false
; CHECK-NEXT: movl %esi, %edi		; CHECK-NEXT: movl %esi, %edi
; CHECK-NEXT: .LBB5_2: # %select.end		; CHECK-NEXT: .LBB6_2: # %select.end
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%cmp = icmp ne i32 %a, 0		%cmp = icmp ne i32 %a, 0
%sel = select i1 %cmp, i32 %a, i32 %b, !prof !1		%sel = select i1 %cmp, i32 %a, i32 %b, !prof !1
ret i32 %sel		ret i32 %sel
}		}

; Note the reversed profile weights: it doesn't matter if it's		; Note the reversed profile weights: it doesn't matter if it's
; obviously true or obviously false.		; obviously true or obviously false.
; Either one should become a branch rather than conditional move.		; Either one should become a branch rather than conditional move.
; TODO: But likely true vs. likely false should affect basic block placement?		; TODO: But likely true vs. likely false should affect basic block placement?
define i32 @weighted_select3(i32 %a, i32 %b) {		define i32 @weighted_select3(i32 %a, i32 %b) {
; CHECK-LABEL: weighted_select3:		; CHECK-LABEL: weighted_select3:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: testl %edi, %edi		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: je .LBB6_1		; CHECK-NEXT: je .LBB7_1
; CHECK-NEXT: # BB#2: # %select.end		; CHECK-NEXT: # BB#2: # %select.end
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
; CHECK-NEXT: .LBB6_1: # %select.false		; CHECK-NEXT: .LBB7_1: # %select.false
; CHECK-NEXT: movl %esi, %edi		; CHECK-NEXT: movl %esi, %edi
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%cmp = icmp ne i32 %a, 0		%cmp = icmp ne i32 %a, 0
%sel = select i1 %cmp, i32 %a, i32 %b, !prof !2		%sel = select i1 %cmp, i32 %a, i32 %b, !prof !2
ret i32 %sel		ret i32 %sel
}		}

Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 113960

llvm/trunk/lib/Target/X86/X86CmovConversion.cpp

llvm/trunk/test/CodeGen/X86/cmov-into-branch.ll

[x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register.
ClosedPublic