Page MenuHomePhabricator

[libc] Add a set of elementary operations
ClosedPublic

Authored by gchatelet on Apr 16 2021, 6:52 AM.

Details

Summary

Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.

A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM

Diff Detail

Event Timeline

gchatelet created this revision.Apr 16 2021, 6:52 AM
gchatelet requested review of this revision.Apr 16 2021, 6:52 AM
gchatelet added inline comments.Apr 16 2021, 7:13 AM
libc/src/string/memory_utils/elements.h
155–167

@avieira the generated code is not that bad but maybe you can come up with a better idea?

The challenge here is to keep the semantic of the 3-way comparison that fits in an int while working on types which diff is larger than int.

Generated code:
intel : https://godbolt.org/z/Khxv6b3r9
arm : https://godbolt.org/z/YTah338hY

190–206

@avieira for the vector version of the three way compare we compute the byte-wise not equal mask, convert this mask to GPR and look at the number of trailing zeros (X86 is LittleEndian), this number is the index of the mismatching byte.

After trying to work on the vector directly to extract this byte I figured out it's easier to just reload it, it should be fast and probably faster than a bunch of vector operations.

i suspect the same approach will work for ARM as well, WDYT?

gchatelet updated this revision to Diff 339200.Apr 21 2021, 6:06 AM
  • Added Tail/HeadTail strategies and tests

Once the Loop strategies are added the framework will be useable.

  • Added Tail/HeadTail strategies and tests
  • Added Loop and AlignedLoop
avieira added inline comments.Apr 22 2021, 5:08 AM
libc/src/string/memory_utils/elements.h
155–167

@gchatelet Avoiding the subtraction and comparing the 'la' and 'lb' variables directly lead to better codegen on AArch64.

190–206

@gchatelet I did not look at using NEON. The problem with the mask approach is that NEON does not have movemask-like instructions, so finding the first mismatched byte on a vector would require more work than it is worth it I think. SVE2 might help us here in the future though.

I'm going to try to rewrite my memcmp implementation using these now and report findings as I go through it :).

libc/test/src/string/memory_utils/CMakeLists.txt
13

AArch64 doesn't support -march=native, we use -mcpu=native instead, so it would be good to specify the correct option per target.

I'm going to try to rewrite my memcmp implementation using these now and report findings as I go through it :).

FYI I'll upload a new version in a few minutes that showcase the API. Nothing radically different but it may help to wait a bit.

Keep the comments coming!

Alright I'll give it a wait, I was currently trying to figure out why I can't build the libc_string_unittests, getting a build error around elements_test.cpp saying there's an issue with the 'cstring' header file 'no member named 'strcmp' in the global namespace' which is an odd one ... What libc++ library do you use to build this normally?

gchatelet updated this revision to Diff 340037.Apr 23 2021, 8:04 AM
  • Use new API with memset and memcpy, more tests and addition of an element relying on builtins.

Alright I'll give it a wait, I was currently trying to figure out why I can't build the libc_string_unittests, getting a build error around elements_test.cpp saying there's an issue with the 'cstring' header file 'no member named 'strcmp' in the global namespace' which is an odd one ... What libc++ library do you use to build this normally?

hmm this is weird, elements_test.cpp does not rely on strcmp. That said it shouldn't even depend on "cstring", I'll update the patch to remove it.

gchatelet updated this revision to Diff 340045.Apr 23 2021, 8:09 AM
  • Remove unneeded cstring header from elements_test.cpp
gchatelet updated this revision to Diff 340055.Apr 23 2021, 8:46 AM
  • Better selection of buffer alignment for AlignedLoop
gchatelet updated this revision to Diff 340540.Apr 26 2021, 8:46 AM
  • Align on destination buffer for x86 and use 32 byte alignment.
gchatelet updated this revision to Diff 341198.Apr 28 2021, 7:31 AM
  • Changed AlignedLoop to Align::Then<Loop>
  • Updated 3-way cmp for scalar 32/64 bits
  • Added disclaimer for builtin operations
  • Updated tests and implementations
gchatelet marked 3 inline comments as done.Apr 28 2021, 7:44 AM
gchatelet added a subscriber: sivachandra.
gchatelet added inline comments.
libc/src/string/memory_utils/elements.h
155–167

Indeed it's better, thank you.

190–206

We can add an arm specialization as I did for x86 if needed, let's keep it as-is for now.

libc/test/src/string/memory_utils/CMakeLists.txt
13

Thx I wasn't aware of this. I found this article, it's helpful and confusing at the same time :-D
https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu

@sivachandra should we define a new CMake macro or do you think this should be part of add_libc_unittest?
The aim is to use whatever is available on the host machine when performing the tests.

sivachandra added inline comments.Apr 28 2021, 8:21 AM
libc/test/src/string/memory_utils/CMakeLists.txt
13

@sivachandra should we define a new CMake macro or do you think this should be part of add_libc_unittest?
The aim is to use whatever is available on the host machine when performing the tests.

I suppose you are asking about adding -march=native and -mcpu=native? If yes, then I think we should add a top-level var like LLVM_LIBC_COMMON_COMPILE_OPTIONS which get set to the appropriate value based on the target machine. And then, both add_entrypoint_object and add_libc_unittest should use it.

avieira added inline comments.Apr 28 2021, 9:13 AM
libc/test/src/string/memory_utils/CMakeLists.txt
13

@gchatelet I didn't quite understand why only use -march/-mcpu during the tests and not for the benchmarking for instance. @sivachandra approach sounds better to me.

As for the options, I think the simple (but maybe somewhat naive) explanation is '-march': determines what features are available, '-mtune': sets the micro-architecture tuning strategy and '-mcpu' set's both for the micro-architecture you pass it. When you pass native it tries to autodetect the micro-architecture.

gchatelet marked 3 inline comments as done.Apr 29 2021, 2:29 AM
gchatelet added inline comments.
libc/test/src/string/memory_utils/CMakeLists.txt
13

@gchatelet I didn't quite understand why only use -march/-mcpu during the tests and not for the benchmarking for instance.

We should. All of this is work in progress. I'll start a patch to address this.

  1. If llvm-libc source code is part of the code base being compiled it definitely makes sense to set the micro-architecture and leverage available features.
  2. In the case of shared library though we may have to provide several implementations and do runtime dispatch, if so, the tests must run all configurations up to what's available on the host.

I believe that we can stick to 1 for now. WDYT @sivachandra ?

gchatelet updated this revision to Diff 351151.Jun 10 2021, 6:12 AM
  • rebase and reorder a few lines
gchatelet updated this revision to Diff 351154.Jun 10 2021, 6:23 AM
  • Remove explicit -march from unittests
gchatelet marked 2 inline comments as done.Jun 10 2021, 6:27 AM

I think this is ready to land.
Concerns about march/mcpu have been addressed as separate patches D101524, D101991, D102233.

Let me know if you have any other questions/suggestions before I submit.

gchatelet updated this revision to Diff 351434.Jun 11 2021, 6:32 AM
  • Fix missing namespace qualifiers
This revision was not accepted when it landed; it landed in state Needs Review.Jun 14 2021, 2:01 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
lebedev.ri added inline comments.
libc/src/string/CMakeLists.txt
197

Passing-by comment: why is this here?
This enforces clang/LLVM as the compiler used to build this library.

gchatelet marked an inline comment as done.Jul 5 2021, 3:57 AM
gchatelet added inline comments.
libc/src/string/CMakeLists.txt
197

Passing-by comment: why is this here?
This enforces clang/LLVM as the compiler used to build this library.

Thx for the comment, yes this should be guarded somehow. It's in my TODO list.