This is an archive of the discontinued LLVM Phabricator instance.

Allow _mm_empty() (via llvm.x86.mmx.emms) to be a no-op without MMX.
AbandonedPublic

Authored by jyknight on Jan 7 2021, 2:27 PM.

Details

Summary

In Clang, the other "MMX" intrinsic functions are being migrated to
SSE2, and will thus be usable even when compiling with -mno-mmx. These
SSE2 implementations don't require the use of _mm_empty(), but
existing (properly-written) code will still have calls to
_mm_empty(). It's therefore desirable to make the function a no-op in
this mode.

The function cannot be made a no-op universally, however, because MMX
may still be used by inline assembly. Therefore, have _mm_empty be
usable both with and without MMX -- and emit the LLVM intrinsic in
both cases, but cause the llvm intrinsic to generate an EMMS
instruction only if MMX is actually enabled.

Diff Detail

Event Timeline

jyknight created this revision.Jan 7 2021, 2:27 PM
jyknight requested review of this revision.Jan 7 2021, 2:27 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 7 2021, 2:27 PM

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases
Analyzing asm block and appending the mmx attribute if we see mmx instructions might be needed. But if we do the analysis, just adding an emms instruction at the end of the block seems better.

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases

Yes, inline or external asm should be the only reason there should be any MMX register usage when all is done here. After this patch, the default is still to have mmx enabled by default with sse, despite that clang won't use it. But users can pass -mno-mmx if they like. And, yes, clang only requires -mmmx if you use the "y" asm constraint, not if you use mmx instructions inside the asm string.

I wrote this patch because making it a no-op is the same behavior GCC has. However, I'm not sure this is necessarily the right way to go. On the plus side for this patch, it allows intrinsic-using code to stop emitting spurious emms instructions, if compiled with -mno-mmx. However, the negative is that inline-asm code which _doesn't_ use the "y" constraint might still be using MMX within an asm blob, and be depending on calls to _mm_empty() outside of the asm, and such code would be silently broken when compiled with -mno-mmx.

At first, I thought that most uses of inline-asm would be using constraints, but after looking around at existing MMX asm, it seems that nearly all of it does _not_ use a "y" constraint or even clobber any fpu or mmx registers. And they do also depend on _mm_empty() in combination with their unmarked inline asm. Which...now that I think about it more, makes passing -mno-mmx to the compiler almost entirely pointless.

So, now I'm thinking I'll just drop this change, actually.

Analyzing asm block and appending the mmx attribute if we see mmx instructions might be needed. But if we do the analysis, just adding an emms instruction at the end of the block seems better.

Analyzing assembly strings is rather fraught -- I don't think we should be doing that. Having the compiler add emms to the end of the block might be nice -- and if we had a proper clobber for "switched into MMX state" then we could do that, perhaps. But we don't, and designing new features for MMX won't help anyone now, because only legacy code is _knowingly_ using MMX. (The biggest issue with the intrinsics is that people are unknowingly using MMX.)

Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases

Yes, inline or external asm should be the only reason there should be any MMX register usage when all is done here. After this patch, the default is still to have mmx enabled by default with sse, despite that clang won't use it. But users can pass -mno-mmx if they like. And, yes, clang only requires -mmmx if you use the "y" asm constraint, not if you use mmx instructions inside the asm string.

I wrote this patch because making it a no-op is the same behavior GCC has. However, I'm not sure this is necessarily the right way to go. On the plus side for this patch, it allows intrinsic-using code to stop emitting spurious emms instructions, if compiled with -mno-mmx. However, the negative is that inline-asm code which _doesn't_ use the "y" constraint might still be using MMX within an asm blob, and be depending on calls to _mm_empty() outside of the asm, and such code would be silently broken when compiled with -mno-mmx.

At first, I thought that most uses of inline-asm would be using constraints, but after looking around at existing MMX asm, it seems that nearly all of it does _not_ use a "y" constraint or even clobber any fpu or mmx registers. And they do also depend on _mm_empty() in combination with their unmarked inline asm. Which...now that I think about it more, makes passing -mno-mmx to the compiler almost entirely pointless.

So, now I'm thinking I'll just drop this change, actually.

Yeah, I forgot the external asm case. Compiler has no way to know if MMX instructions are used by external code. So I'm in favour of keeping _mm_empty with mmx attribute and adding comments it is only used for context switch from inline or external code. Then if user is passing -mno-mmx and sure of it, they should remove the intrinsic from their code.

jyknight abandoned this revision.Jan 8 2021, 7:38 PM

OK thanks -- abandoning this patch.

I'll adjust the comment on _mm_empty to mention that it's no longer necessary except with asm in the intrinsics patch.