This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsX86.def
-
lib/Headers/
-
Headers/
-
mmintrin.h
-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
mmx-emms.ll

Differential D94268

Allow _mm_empty() (via llvm.x86.mmx.emms) to be a no-op without MMX.
AbandonedPublic

Authored by jyknight on Jan 7 2021, 2:27 PM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
RKSimon
pengfei

Summary

In Clang, the other "MMX" intrinsic functions are being migrated to
SSE2, and will thus be usable even when compiling with -mno-mmx. These
SSE2 implementations don't require the use of _mm_empty(), but
existing (properly-written) code will still have calls to
_mm_empty(). It's therefore desirable to make the function a no-op in
this mode.

The function cannot be made a no-op universally, however, because MMX
may still be used by inline assembly. Therefore, have _mm_empty be
usable both with and without MMX -- and emit the LLVM intrinsic in
both cases, but cause the llvm intrinsic to generate an EMMS
instruction only if MMX is actually enabled.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jyknight created this revision.Jan 7 2021, 2:27 PM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptJan 7 2021, 2:27 PM

jyknight requested review of this revision.Jan 7 2021, 2:27 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 7 2021, 2:27 PM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B84394: Diff 315248.Jan 7 2021, 2:30 PM

craig.topper added a reviewer: pengfei.Jan 7 2021, 3:47 PM

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases
Analyzing asm block and appending the mmx attribute if we see mmx instructions might be needed. But if we do the analysis, just adding an emms instruction at the end of the block seems better.

In D94268#2485958, @pengfei wrote:

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases

Yes, inline or external asm should be the only reason there should be any MMX register usage when all is done here. After this patch, the default is still to have mmx enabled by default with sse, despite that clang won't use it. But users can pass -mno-mmx if they like. And, yes, clang only requires -mmmx if you use the "y" asm constraint, not if you use mmx instructions inside the asm string.

I wrote this patch because making it a no-op is the same behavior GCC has. However, I'm not sure this is necessarily the right way to go. On the plus side for this patch, it allows intrinsic-using code to stop emitting spurious emms instructions, if compiled with -mno-mmx. However, the negative is that inline-asm code which _doesn't_ use the "y" constraint might still be using MMX within an asm blob, and be depending on calls to _mm_empty() outside of the asm, and such code would be silently broken when compiled with -mno-mmx.

At first, I thought that most uses of inline-asm would be using constraints, but after looking around at existing MMX asm, it seems that nearly all of it does _not_ use a "y" constraint or even clobber any fpu or mmx registers. And they do also depend on _mm_empty() in combination with their unmarked inline asm. Which...now that I think about it more, makes passing -mno-mmx to the compiler almost entirely pointless.

So, now I'm thinking I'll just drop this change, actually.

Analyzing asm block and appending the mmx attribute if we see mmx instructions might be needed. But if we do the analysis, just adding an emms instruction at the end of the block seems better.

Analyzing assembly strings is rather fraught -- I don't think we should be doing that. Having the compiler add emms to the end of the block might be nice -- and if we had a proper clobber for "switched into MMX state" then we could do that, perhaps. But we don't, and designing new features for MMX won't help anyone now, because only legacy code is _knowingly_ using MMX. (The biggest issue with the intrinsics is that people are unknowingly using MMX.)

In D94268#2487765, @jyknight wrote:

In D94268#2485958, @pengfei wrote:

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases

Yes, inline or external asm should be the only reason there should be any MMX register usage when all is done here. After this patch, the default is still to have mmx enabled by default with sse, despite that clang won't use it. But users can pass -mno-mmx if they like. And, yes, clang only requires -mmmx if you use the "y" asm constraint, not if you use mmx instructions inside the asm string.

I wrote this patch because making it a no-op is the same behavior GCC has. However, I'm not sure this is necessarily the right way to go. On the plus side for this patch, it allows intrinsic-using code to stop emitting spurious emms instructions, if compiled with -mno-mmx. However, the negative is that inline-asm code which _doesn't_ use the "y" constraint might still be using MMX within an asm blob, and be depending on calls to _mm_empty() outside of the asm, and such code would be silently broken when compiled with -mno-mmx.

At first, I thought that most uses of inline-asm would be using constraints, but after looking around at existing MMX asm, it seems that nearly all of it does _not_ use a "y" constraint or even clobber any fpu or mmx registers. And they do also depend on _mm_empty() in combination with their unmarked inline asm. Which...now that I think about it more, makes passing -mno-mmx to the compiler almost entirely pointless.

So, now I'm thinking I'll just drop this change, actually.

Yeah, I forgot the external asm case. Compiler has no way to know if MMX instructions are used by external code. So I'm in favour of keeping _mm_empty with mmx attribute and adding comments it is only used for context switch from inline or external code. Then if user is passing -mno-mmx and sure of it, they should remove the intrinsic from their code.

OK thanks -- abandoning this patch.

I'll adjust the comment on _mm_empty to mention that it's no longer necessary except with asm in the intrinsics patch.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsX86.def

2 lines

lib/

Headers/

mmintrin.h

7 lines

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

8 lines

test/

CodeGen/

X86/

mmx-emms.ll

21 lines

Diff 315248

clang/include/clang/Basic/BuiltinsX86.def

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_pfpnacc, "V2fV2fV2f", "ncV:64:", "3dnowa")			TARGET_BUILTIN(__builtin_ia32_pfpnacc, "V2fV2fV2f", "ncV:64:", "3dnowa")
	TARGET_BUILTIN(__builtin_ia32_pi2fw, "V2fV2i", "ncV:64:", "3dnowa")			TARGET_BUILTIN(__builtin_ia32_pi2fw, "V2fV2i", "ncV:64:", "3dnowa")
	TARGET_BUILTIN(__builtin_ia32_pswapdsf, "V2fV2f", "ncV:64:", "3dnowa")			TARGET_BUILTIN(__builtin_ia32_pswapdsf, "V2fV2f", "ncV:64:", "3dnowa")
	TARGET_BUILTIN(__builtin_ia32_pswapdsi, "V2iV2i", "ncV:64:", "3dnowa")			TARGET_BUILTIN(__builtin_ia32_pswapdsi, "V2iV2i", "ncV:64:", "3dnowa")

	// MMX usage is no longer supported in Clang; all of the formerly "MMX"			// MMX usage is no longer supported in Clang; all of the formerly "MMX"
	// intrinsic functions are now expanded into SSE2 code in the headers.			// intrinsic functions are now expanded into SSE2 code in the headers.

	TARGET_BUILTIN(__builtin_ia32_emms, "v", "n", "mmx")			TARGET_BUILTIN(__builtin_ia32_emms, "v", "n", "")
	TARGET_BUILTIN(__builtin_ia32_vec_ext_v4hi, "sV4sIi", "ncV:64:", "sse")			TARGET_BUILTIN(__builtin_ia32_vec_ext_v4hi, "sV4sIi", "ncV:64:", "sse")
	TARGET_BUILTIN(__builtin_ia32_vec_set_v4hi, "V4sV4ssIi", "ncV:64:", "sse")			TARGET_BUILTIN(__builtin_ia32_vec_set_v4hi, "V4sV4ssIi", "ncV:64:", "sse")

	// SSE intrinsics.			// SSE intrinsics.
	TARGET_BUILTIN(__builtin_ia32_comieq, "iV4fV4f", "ncV:128:", "sse")			TARGET_BUILTIN(__builtin_ia32_comieq, "iV4fV4f", "ncV:128:", "sse")
	TARGET_BUILTIN(__builtin_ia32_comilt, "iV4fV4f", "ncV:128:", "sse")			TARGET_BUILTIN(__builtin_ia32_comilt, "iV4fV4f", "ncV:128:", "sse")
	TARGET_BUILTIN(__builtin_ia32_comile, "iV4fV4f", "ncV:128:", "sse")			TARGET_BUILTIN(__builtin_ia32_comile, "iV4fV4f", "ncV:128:", "sse")
	TARGET_BUILTIN(__builtin_ia32_comigt, "iV4fV4f", "ncV:128:", "sse")			TARGET_BUILTIN(__builtin_ia32_comigt, "iV4fV4f", "ncV:128:", "sse")
	▲ Show 20 Lines • Show All 1,790 Lines • Show Last 20 Lines

clang/lib/Headers/mmintrin.h

	Show All 35 Lines

	/* Define the default attributes for the functions in this file. */			/* Define the default attributes for the functions in this file. */
	#define __DEFAULT_FN_ATTRS_SSE2 __attribute__((__always_inline__, __nodebug__, __target__("sse2"), __min_vector_width__(64)))			#define __DEFAULT_FN_ATTRS_SSE2 __attribute__((__always_inline__, __nodebug__, __target__("sse2"), __min_vector_width__(64)))

	#define __trunc64(x) (__m64)__builtin_shufflevector((__v2di)(x), __extension__ (__v2di){}, 0)			#define __trunc64(x) (__m64)__builtin_shufflevector((__v2di)(x), __extension__ (__v2di){}, 0)
	#define __anyext128(x) (__m128i)__builtin_shufflevector((__v2si)(x), __extension__ (__v2si){}, 0, 1, -1, -1)			#define __anyext128(x) (__m128i)__builtin_shufflevector((__v2si)(x), __extension__ (__v2si){}, 0, 1, -1, -1)
	#define __extract2_32(a) (__m64)__builtin_shufflevector((__v4si)(a), __extension__ (__v4si){}, 0, 2);			#define __extract2_32(a) (__m64)__builtin_shufflevector((__v4si)(a), __extension__ (__v4si){}, 0, 2);

	/// Clears the MMX state by setting the state of the x87 stack registers			/// Clears the MMX state by setting the state of the x87 stack registers to
	/// to empty.			/// empty. This intrinsic is accepted but emits no instructions if MMX is
				/// disabled at compile-time (e.g. via -mno-mmx).
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> EMMS </c> instruction.			/// This intrinsic corresponds to the <c> EMMS </c> instruction.
	///			///
	static __inline__ void __attribute__((__always_inline__, __nodebug__, __target__("mmx")))			static __inline__ void __attribute__((__always_inline__, __nodebug__))
	_mm_empty(void)			_mm_empty(void)
	{			{
	__builtin_ia32_emms();			__builtin_ia32_emms();
	}			}

	/// Constructs a 64-bit integer vector, setting the lower 32 bits to the			/// Constructs a 64-bit integer vector, setting the lower 32 bits to the
	/// value of the 32-bit integer parameter and setting the upper 32 bits to 0.			/// value of the 32-bit integer parameter and setting the upper 32 bits to 0.
	///			///
	▲ Show 20 Lines • Show All 1,558 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 26,164 Lines • ▼ Show 20 Lines	case Intrinsic::x86_testui: {
SDLoc dl(Op);		SDLoc dl(Op);
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
SDVTList VTs = DAG.getVTList(MVT::i32, MVT::Other);		SDVTList VTs = DAG.getVTList(MVT::i32, MVT::Other);
SDValue Operation = DAG.getNode(X86ISD::TESTUI, dl, VTs, Chain);		SDValue Operation = DAG.getNode(X86ISD::TESTUI, dl, VTs, Chain);
SDValue SetCC = getSETCC(X86::COND_B, Operation.getValue(0), dl, DAG);		SDValue SetCC = getSETCC(X86::COND_B, Operation.getValue(0), dl, DAG);
return DAG.getNode(ISD::MERGE_VALUES, dl, Op->getVTList(), SetCC,		return DAG.getNode(ISD::MERGE_VALUES, dl, Op->getVTList(), SetCC,
Operation.getValue(1));		Operation.getValue(1));
}		}
		case Intrinsic::x86_mmx_emms: {
		// Emit nothing for the EMMS intrinsic when MMX is disabled.
		if (!Subtarget.hasMMX()) {
		SDValue Chain = Op.getOperand(0);
		return Chain;
		}
		return SDValue();
		}
}		}
return SDValue();		return SDValue();
}		}

SDLoc dl(Op);		SDLoc dl(Op);
switch(IntrData->Type) {		switch(IntrData->Type) {
default: llvm_unreachable("Unknown Intrinsic Type");		default: llvm_unreachable("Unknown Intrinsic Type");
case RDSEED:		case RDSEED:
▲ Show 20 Lines • Show All 25,143 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/mmx-emms.ll

This file was added.

				; RUN: llc -mcpu=i686 -mattr=+mmx < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-MMX
				; RUN: llc -mcpu=i686 -mattr=-mmx < %s \| FileCheck %s --check-prefixes=CHECK
				; RUN: llc -mcpu=i686 -mattr=+sse2 < %s \| FileCheck %s --check-prefixes=CHECK

				target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32"
				target triple = "i386-pc-linux-gnu"

				;; Verify that the llvm.x86.mmx.emms intrinsic works whether or not
				;; MMX is enabled, but that it doesn't emit any instructions if mmx is
				;; disabled.

				;; CHECK-LABEL: mmx_emms:
				;; CHECK: # %bb.0:
				;; CHECK-MMX-NEXT: emms
				;; CHECK-NEXT: retl
				define void @mmx_emms() {
				tail call void @llvm.x86.mmx.emms() nounwind
				ret void
				}

				declare void @llvm.x86.mmx.emms() nounwind