This is an archive of the discontinued LLVM Phabricator instance.

[libc] Fix builtin definition for memory functions
ClosedPublic

Authored by michaelrj on Nov 11 2022, 2:13 PM.

Details

Summary

The memory functions are highly performance sensitive and use builtins
where possible, but also need to define those functions names when they
don't exist to avoid compilation errors. Previously all those
redefinitions were behind the SSE2 flag for x86, which caused errors on
CPUs that supported SSE2 but not AVX512. This patch splits the various
CPU extensions out to avoid errors on such CPUs.

Diff Detail

Event Timeline

michaelrj created this revision.Nov 11 2022, 2:13 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 11 2022, 2:13 PM
michaelrj requested review of this revision.Nov 11 2022, 2:13 PM
lntue added a subscriber: lntue.Nov 11 2022, 3:43 PM
lntue added inline comments.
libc/src/string/memory_utils/op_x86.h
36

Can you move __AVX2__ switch to between __AVX512__ and __SSE2__?

michaelrj updated this revision to Diff 475273.Nov 14 2022, 2:16 PM
michaelrj marked an inline comment as done.

add AVX512BW to the list of features to be checked and move to simpler redefinitions for the builtins.

michaelrj marked an inline comment as done.Nov 14 2022, 2:19 PM
michaelrj added inline comments.
libc/src/string/memory_utils/op_x86.h
31

ah, yes. Fixed

lntue added inline comments.Nov 14 2022, 3:30 PM
libc/src/string/memory_utils/op_x86.h
30

How will this work if _mm512_cmpneq_epi8_mask is defined as a function or compiler builtin instead of macro? Will defined(_mm512_cmpneq_epi8_mask) also return false, and the macro will overwrite the real function calls?

gchatelet added inline comments.Nov 15 2022, 1:16 AM
libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
9
libc/src/string/memory_utils/op_x86.h
30

So technically the right way to do it is to rely on the __has_builtin intrinsic.
It is available on clang forever but is quite recent on gcc, it requires gcc 10 (another bummer to have to support both gcc and clang)
https://gcc.gnu.org/onlinedocs/gcc-10.1.0/cpp/_005f_005fhas_005fbuiltin.html

michaelrj updated this revision to Diff 475629.Nov 15 2022, 4:44 PM
michaelrj marked 3 inline comments as done.

move from checking if macros are defined back to which extensions are defined.

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
9

I switched the memory functions over to AVX512BW instead of AVX512F since it seems like only BW matters for this case.

libc/src/string/memory_utils/op_x86.h
30

__has_builtin doesn't work because this isn't the name of the builtin. From avx512bwintrin.h:

#define _mm512_cmp_epi8_mask(a, b, p) \
  ((__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)(__m512i)(a), \
                                          (__v64qi)(__m512i)(b), (int)(p), \
                                          (__mmask64)-1))

And #ifndef doesn't work either because sometimes these are defined as functions, from avx2intrin.h (on the same system):

static __inline__ int __DEFAULT_FN_ATTRS256
_mm256_movemask_epi8(__m256i __a)
{
  return __builtin_ia32_pmovmskb256((__v32qi)__a);
}

But it appears that the problems I was having trying to build on windows previously can be fixed by using the proper extension name (AVX512BW vs AVX512F), this builds clean on both and (afaict) uses the correct builtins.

gchatelet added inline comments.Nov 16 2022, 12:54 AM
libc/src/string/memory_utils/op_x86.h
30

Ha yes indeed, they are sometimes macros sometimes builtins...

Anyways it's great to know that using the proper preprocessor definitions fixes it!

BTW do we have windows CI machines? I don't think our bazel config currently handles windows build but I can have a look if needed.

michaelrj added inline comments.Nov 16 2022, 10:24 AM
libc/src/string/memory_utils/op_x86.h
30

We don't currently have Windows CI working, and the Windows builds I've tried have all been cmake/ninja.

gchatelet accepted this revision.Nov 16 2022, 12:43 PM

LGTM just make sure to keep AVX512F for memmove, memcpy, memset and bzero.

libc/src/string/CMakeLists.txt
385

Only bcmp and memcmp require AVX512BW, the other ones only require AVX512F

This revision is now accepted and ready to land.Nov 16 2022, 12:43 PM
michaelrj updated this revision to Diff 475900.Nov 16 2022, 1:01 PM
michaelrj marked 3 inline comments as done.

rebase and address comments to fix which CPU extensions are relevant to which operations.

I'm going to land this as is for now, but I have identified a minor issue: When building variants of the memory functions that don't need AVX512BW, it doesn't seem to set the compiler flag for it. This causes a lot of warnings when building memcpy (for example) that boil down to "the macro _mm512_cmpneq_epi8_mask is being redefined". I don't think this will cause any real issues since that builtin isn't used for that function, but it will cause some noise in the logs.

This revision was landed with ongoing or failed builds.Nov 16 2022, 1:03 PM
This revision was automatically updated to reflect the committed changes.