This is an archive of the discontinued LLVM Phabricator instance.

Fix confusion over x86_64 CMOV semantics in order to avoid unnecessary zero extensions
ClosedPublic

Authored by DavidKreitzer on Jul 28 2016, 2:37 PM.

Download Raw Diff

Details

Reviewers

mkuper
sunfish
aaboud

Commits

rG8b959e5cfa89: Avoid unnecessary 32-bit to 64-bit zero extensions following 32-bit CMOV…
rL277148: Avoid unnecessary 32-bit to 64-bit zero extensions following

Summary

We noticed this issue while working on something unrelated.

The check for X86ISD::CMOV was added long ago in this revision:

r81814 | djg | 2009-09-14 17:14:11 -0700 (Mon, 14 Sep 2009) | 3 lines

On x86-64, the 32-bit cmov doesn't actually clear the high 32-bit of
its result if the condition is false.

But that statement is incorrect. The 32-bit CMOVs do clear the high 32 bits of the result regardless of whether the condition is true or false. That is easily verifiable, and other compilers including MSVC and the Intel compiler take advantage of this semantic to avoid unnecessary 32-bit --> 64-bit zero extends.

The latest architecture manuals from both Intel and AMD support this change, though I wonder if an earlier documentation bug caused the confusion. At any rate, the latest AMD manual says, "In 64-bit mode, CMOVcc with a 32-bit operand size will clear the upper 32 bits of the destination register even
if the condition is false." And the latest Intel manual describes the behavior in pseudo-code as

Operation

temp ← SRC

IF condition TRUE

THEN
  DEST ← temp;
FI;

ELSE

IF (OperandSize = 32 and IA-32e mode active)
  THEN
    DEST[63:32] ← 0;
FI;

FI;

Not surprisingly, there was no significant performance impact from this change (on cpu2000, et al).

Diff Detail

Repository: rL LLVM

Event Timeline

DavidKreitzer updated this revision to Diff 65995.Jul 28 2016, 2:37 PM

DavidKreitzer retitled this revision from to Fix confusion over x86_64 CMOV semantics in order to avoid unnecessary zero extensions.

DavidKreitzer updated this object.

DavidKreitzer added reviewers: sunfish, mkuper, aaboud.

DavidKreitzer added a subscriber: llvm-commits.

LGTM

This revision is now accepted and ready to land.Jul 28 2016, 2:45 PM

Closed by commit rL277148: Avoid unnecessary 32-bit to 64-bit zero extensions following (authored by dlkreitz). · Explain WhyJul 29 2016, 8:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86InstrCompiler.td

6 lines

test/

CodeGen/

X86/

cmov.ll

9 lines

Diff 66124

llvm/trunk/lib/Target/X86/X86InstrCompiler.td

	Show First 20 Lines • Show All 1,283 Lines • ▼ Show 20 Lines
	def : Pat<(i64 (anyext GR16:$src)),			def : Pat<(i64 (anyext GR16:$src)),
	(SUBREG_TO_REG (i64 0), (MOVZX32rr16 GR16 :$src), sub_32bit)>;			(SUBREG_TO_REG (i64 0), (MOVZX32rr16 GR16 :$src), sub_32bit)>;
	def : Pat<(i64 (anyext GR32:$src)),			def : Pat<(i64 (anyext GR32:$src)),
	(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;			(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;


	// Any instruction that defines a 32-bit result leaves the high half of the			// Any instruction that defines a 32-bit result leaves the high half of the
	// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may			// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
	// be copying from a truncate. And x86's cmov doesn't do anything if the			// be copying from a truncate. Any other 32-bit operation will zero-extend
	// condition is false. But any other 32-bit operation will zero-extend
	// up to 64 bits.			// up to 64 bits.
	def def32 : PatLeaf<(i32 GR32:$src), [{			def def32 : PatLeaf<(i32 GR32:$src), [{
	return N->getOpcode() != ISD::TRUNCATE &&			return N->getOpcode() != ISD::TRUNCATE &&
	N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&			N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&
	N->getOpcode() != ISD::CopyFromReg &&			N->getOpcode() != ISD::CopyFromReg &&
	N->getOpcode() != ISD::AssertSext &&			N->getOpcode() != ISD::AssertSext;
	N->getOpcode() != X86ISD::CMOV;
	}]>;			}]>;

	// In the case of a 32-bit def that is known to implicitly zero-extend,			// In the case of a 32-bit def that is known to implicitly zero-extend,
	// we can use a SUBREG_TO_REG.			// we can use a SUBREG_TO_REG.
	def : Pat<(i64 (zext def32:$src)),			def : Pat<(i64 (zext def32:$src)),
	(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;			(SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 660 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cmov.ll

Show All 27 Lines	; CHECK-NEXT: ret
%1 = and i32 %0, 1 ; <i32> [#uses=1]		%1 = and i32 %0, 1 ; <i32> [#uses=1]
%toBool = icmp eq i32 %1, 0 ; <i1> [#uses=1]		%toBool = icmp eq i32 %1, 0 ; <i1> [#uses=1]
%v = load i32, i32* %vp		%v = load i32, i32* %vp
%.0 = select i1 %toBool, i32 12, i32 %v ; <i32> [#uses=1]		%.0 = select i1 %toBool, i32 12, i32 %v ; <i32> [#uses=1]
ret i32 %.0		ret i32 %.0
}		}


; x86's 32-bit cmov doesn't clobber the high 32 bits of the destination		; x86's 32-bit cmov zeroes the high 32 bits of the destination. Make
; if the condition is false. An explicit zero-extend (movl) is needed		; sure CodeGen takes advantage of that to avoid an unnecessary
; after the cmov.		; zero-extend (movl) after the cmov.

declare void @bar(i64) nounwind		declare void @bar(i64) nounwind

define void @test3(i64 %a, i64 %b, i1 %p) nounwind {		define void @test3(i64 %a, i64 %b, i1 %p) nounwind {
; CHECK-LABEL: test3:		; CHECK-LABEL: test3:
; CHECK: cmov{{n?}}el %[[R1:e..]], %[[R2:e..]]		; CHECK: cmov{{n?}}el %[[R1:e..]], %[[R2:e..]]
; CHECK-NEXT: movl %[[R2]], %{{e..}}		; CHECK-NOT: movl
		; CHECK: call

%c = trunc i64 %a to i32		%c = trunc i64 %a to i32
%d = trunc i64 %b to i32		%d = trunc i64 %b to i32
%e = select i1 %p, i32 %c, i32 %d		%e = select i1 %p, i32 %c, i32 %d
%f = zext i32 %e to i64		%f = zext i32 %e to i64
call void @bar(i64 %f)		call void @bar(i64 %f)
ret void		ret void
}		}
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines