The memory functions are highly performance sensitive and use builtins
where possible, but also need to define those functions names when they
don't exist to avoid compilation errors. Previously all those
redefinitions were behind the SSE2 flag for x86, which caused errors on
CPUs that supported SSE2 but not AVX512. This patch splits the various
CPU extensions out to avoid errors on such CPUs.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libc/src/string/memory_utils/op_x86.h | ||
---|---|---|
36 | Can you move __AVX2__ switch to between __AVX512__ and __SSE2__? |
libc/src/string/memory_utils/op_x86.h | ||
---|---|---|
31 |
add AVX512BW to the list of features to be checked and move to simpler redefinitions for the builtins.
libc/src/string/memory_utils/op_x86.h | ||
---|---|---|
31 | ah, yes. Fixed |
libc/src/string/memory_utils/op_x86.h | ||
---|---|---|
30 | How will this work if _mm512_cmpneq_epi8_mask is defined as a function or compiler builtin instead of macro? Will defined(_mm512_cmpneq_epi8_mask) also return false, and the macro will overwrite the real function calls? |
libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake | ||
---|---|---|
9 | Thx! We'd also need to have it added to the CMakefile.txt for the bcmp and memcmp functions | |
libc/src/string/memory_utils/op_x86.h | ||
30 | So technically the right way to do it is to rely on the __has_builtin intrinsic. |
move from checking if macros are defined back to which extensions are defined.
libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake | ||
---|---|---|
9 | I switched the memory functions over to AVX512BW instead of AVX512F since it seems like only BW matters for this case. | |
libc/src/string/memory_utils/op_x86.h | ||
30 | __has_builtin doesn't work because this isn't the name of the builtin. From avx512bwintrin.h: #define _mm512_cmp_epi8_mask(a, b, p) \ ((__mmask64)__builtin_ia32_cmpb512_mask((__v64qi)(__m512i)(a), \ (__v64qi)(__m512i)(b), (int)(p), \ (__mmask64)-1)) And #ifndef doesn't work either because sometimes these are defined as functions, from avx2intrin.h (on the same system): static __inline__ int __DEFAULT_FN_ATTRS256 _mm256_movemask_epi8(__m256i __a) { return __builtin_ia32_pmovmskb256((__v32qi)__a); } But it appears that the problems I was having trying to build on windows previously can be fixed by using the proper extension name (AVX512BW vs AVX512F), this builds clean on both and (afaict) uses the correct builtins. |
libc/src/string/memory_utils/op_x86.h | ||
---|---|---|
30 | Ha yes indeed, they are sometimes macros sometimes builtins... Anyways it's great to know that using the proper preprocessor definitions fixes it! BTW do we have windows CI machines? I don't think our bazel config currently handles windows build but I can have a look if needed. |
libc/src/string/memory_utils/op_x86.h | ||
---|---|---|
30 | We don't currently have Windows CI working, and the Windows builds I've tried have all been cmake/ninja. |
LGTM just make sure to keep AVX512F for memmove, memcpy, memset and bzero.
libc/src/string/CMakeLists.txt | ||
---|---|---|
385 | Only bcmp and memcmp require AVX512BW, the other ones only require AVX512F |
rebase and address comments to fix which CPU extensions are relevant to which operations.
I'm going to land this as is for now, but I have identified a minor issue: When building variants of the memory functions that don't need AVX512BW, it doesn't seem to set the compiler flag for it. This causes a lot of warnings when building memcpy (for example) that boil down to "the macro _mm512_cmpneq_epi8_mask is being redefined". I don't think this will cause any real issues since that builtin isn't used for that function, but it will cause some noise in the logs.
Thx! We'd also need to have it added to the CMakefile.txt for the bcmp and memcmp functions