This is an archive of the discontinued LLVM Phabricator instance.

[libc] Simplifies multi implementations
ClosedPublic

Authored by gchatelet on May 5 2021, 4:40 AM.

Details

Summary

This is a follow up on D101524 which:

  • simplifies cpu features detection and usage,
  • flattens target dependent optimizations so it's obvious which implementations are generated,
  • provides an implementation targeting the host (march/mtune=native) for the mem* functions,
  • makes sure all implementations are unittested (provided the host can run them).

Diff Detail

Event Timeline

gchatelet created this revision.May 5 2021, 4:40 AM
gchatelet requested review of this revision.May 5 2021, 4:40 AM
gchatelet updated this revision to Diff 342999.

use D101524 as base revision

gchatelet updated this revision to Diff 343001.May 5 2021, 4:44 AM
  • revert local change
gchatelet updated this revision to Diff 343071.May 5 2021, 9:04 AM
  • Differ changes to the benchmarking system to a later patch
gchatelet updated this revision to Diff 343072.May 5 2021, 9:06 AM
  • Differ changes to the benchmarking system to a later patch (for real)
gchatelet updated this revision to Diff 343073.May 5 2021, 9:07 AM
  • Amending commit to reflect the change (hopefully)
gchatelet updated this revision to Diff 343076.May 5 2021, 9:11 AM
  • One more time
gchatelet edited the summary of this revision. (Show Details)May 5 2021, 9:12 AM

LGTM otherwise.

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
9

I'd really use -mcpu here though, as it will also enable available features.

This is now ready for review (sorry for the many updates)

@sivachandra with this patch we now produce a memcpy_opt_host (resp. memset_opt_host, bzero_opt_host) variant that compiles to the host native architecture in addition to the memcpy, memset, bzero functions that compiles to the compiler default architecture.

I think we have the following scenarios:

  • we compile a release and we want to specify which target to compile for (right now this is what the compiler decide is the default).
  • we compile a release via cross compilation (ARM compiled on x86 or vice versa), so we want to explicitly specify which arch to target.
  • we want to run benchmarks on a specific machine and want it to leverage -march/-mtune = native.
gchatelet updated this revision to Diff 343333.May 6 2021, 2:40 AM
  • Use mcpu instead of mtune for ARM
gchatelet retitled this revision from [libc] Simplifies multi implementations and benchmarks to [libc] Simplifies multi implementations.May 6 2021, 2:41 AM
avieira added inline comments.May 6 2021, 6:33 AM
libc/src/string/CMakeLists.txt
234

During some of my experiments I learned that CMAKE prefers if you pass such options with "SHELL:-mllvm --tail-mege-threshold=0" just in case there is another '-mllvm opt' or '-A B' like option.

Having said this, I am not seeing (with either SHELL or not) this option being passed when I do `ninja libc-benchmark-main -v -v', but don't quote me on this just yet, I might need to update all the patches you've put up for review.

Any chance you can verify you are seeing the same on your side?

gchatelet marked an inline comment as done.May 6 2021, 7:29 AM
gchatelet added inline comments.
libc/src/string/CMakeLists.txt
234

I confirm that it does not work : D
For one, I need to remove the double quotes, then SHELL does not seem to work.
I'm looking into it and will provide an update shortly.

gchatelet added inline comments.May 6 2021, 7:45 AM
libc/src/string/CMakeLists.txt
234

So the proper way to do it is to double quote AND use SHELL at the same time, e.g.

add_memcpy(memcpy COMPILE_OPTIONS "SHELL:-mllvm --tail-merge-threshold=0"
                  COMPILE_OPTIONS "SHELL:-mllvm --combiner-global-alias-analysis")

If SHELL is not provided we should not double quote. I'll update the patch soonish.

gchatelet updated this revision to Diff 343406.May 6 2021, 7:48 AM
  • Fix CMake flag deduplication issue

Overall LGTM. I have few questions I want to get clarity on.

I think we have the following scenarios:

  • we compile a release and we want to specify which target to compile for (right now this is what the compiler decide is the default).

In other words, when compiling for the host, we will let the compiler decide the best options?

  • we compile a release via cross compilation (ARM compiled on x86 or vice versa), so we want to explicitly specify which arch to target.

In other words, when cross-compiling, we will have to specify certain CMake vars explicitly?

  • we want to run benchmarks on a specific machine and want it to leverage -march/-mtune = native.

Does it mean that benchmarks are always compiled for the host?

I think we have the following scenarios:

  • we compile a release and we want to specify which target to compile for (right now this is what the compiler decide is the default).

In other words, when compiling for the host, we will let the compiler decide the best options?

Compiling for the host means using -march=native / -mcpu=native. Here I'm just talking about compiling llvm-libc without any options.
That is: when doing a full build, what is the targeted architecture?
AFAIU right now, the compiler decides. With a new version of the compiler the default may change and the targeted architecture as well.
It may be best to explicitly say "I want to compile llvm-libc for haswell when on x86".
AFAICT compilers are being very conservative and by default you get SSE2 only for x86 (check predefined macros without specifying march).

This is the rationale behind https://reviews.llvm.org/D101991

  • we compile a release via cross compilation (ARM compiled on x86 or vice versa), so we want to explicitly specify which arch to target.

In other words, when cross-compiling, we will have to specify certain CMake vars explicitly?

Yes as for the previous bullet point I believe.
You always want to specify what you are targeting.

  • we want to run benchmarks on a specific machine and want it to leverage -march/-mtune = native.

Does it mean that benchmarks are always compiled for the host?

We want to be able to benchmark all the implementations, presumably the host one will perform best.
I removed this part from the patch but we will provide one benchmark per function and per implementation so people can easily test and compare them out.

sivachandra accepted this revision.May 6 2021, 9:27 AM

Thanks for explaining.

This revision is now accepted and ready to land.May 6 2021, 9:27 AM

I'll wait for D101991 before submitting this one.

This revision was automatically updated to reflect the committed changes.

Unfortunately, I had to revert this as the bots were failing like this: https://lab.llvm.org/buildbot/#/builders/78/builds/11551.
I looked around for a bit to see if I can fix forward, but I do not think I know enough about the various x86_64 arch names. May be -march=k8 is all we need, but I will let you decide what the best course of action is.

This change also triggers a failure on the aarch64 bots like this: https://lab.llvm.org/buildbot/#/builders/138/builds/4761

Unfortunately, I had to revert this as the bots were failing like this: https://lab.llvm.org/buildbot/#/builders/78/builds/11551.
I looked around for a bit to see if I can fix forward, but I do not think I know enough about the various x86_64 arch names. May be -march=k8 is all we need, but I will let you decide what the best course of action is.

Thx for the revert Siva. I was AFK when I saw the failure.
These pseudo archs are available from Clang 12 on. Do you happen to know which clang version is being used by the buildbot?
We can target more genuine architectures instead, that said those pseudo arch better capture the requirements.

Do you happen to know which clang version is being used by the buildbot?

The builders use clang-7 but I am working on upgrading them to clang-11. If you think clang-12 would be ideal, I can try getting it up to clang-12. I will try now and report back here.

This change also triggers a failure on the aarch64 bots like this: https://lab.llvm.org/buildbot/#/builders/138/builds/4761

Where can I see the real error? I can't find it on the page you linked in. All I can find is the python executor error which is not helpful :-/
Also, I'm puzzled as to why the failures triggers right now, the patch has been sent more than 9h ago.

Where can I see the real error? I can't find it on the page you linked in. All I can find is the python executor error which is not helpful :-/
Also, I'm puzzled as to why the failures triggers right now, the patch has been sent more than 9h ago.

Yes, the timing of the x86_64 failures confused me as well. The aarch64 bot dropped off the network over the weekend so I had to bring it back up. I noticed the failure as soon as I brought it up. About the full failure, if you click on the stdio links on the build page, it will take you to the full log like this: https://lab.llvm.org/buildbot/#/builders/138/builds/4761/steps/4/logs/stdio.

The builders use clang-7 but I am working on upgrading them to clang-11. If you think clang-12 would be ideal, I can try getting it up to clang-12. I will try now and report back here.

There is no easy way to go to clang-12. Can we work with clang-11? I will try to move to clang-11 now.

Where can I see the real error? I can't find it on the page you linked in. All I can find is the python executor error which is not helpful :-/
Also, I'm puzzled as to why the failures triggers right now, the patch has been sent more than 9h ago.

Yes, the timing of the x86_64 failures confused me as well. The aarch64 bot dropped off the network over the weekend so I had to bring it back up. I noticed the failure as soon as I brought it up. About the full failure, if you click on the stdio links on the build page, it will take you to the full log like this: https://lab.llvm.org/buildbot/#/builders/138/builds/4761/steps/4/logs/stdio.

Thank you! I'll create a separate patch with the roll forward and the fixes for both aarch64 and x86.

The builders use clang-7 but I am working on upgrading them to clang-11. If you think clang-12 would be ideal, I can try getting it up to clang-12. I will try now and report back here.

There is no easy way to go to clang-12. Can we work with clang-11? I will try to move to clang-11 now.

Yes it's fine, I'll use a different scheme, I think it's better to stick to what's broadly available. thx 👍

libc/src/string/CMakeLists.txt