Page MenuHomePhabricator

[Sanitizers][Test] If the default linker is LLD, disable threads
AcceptedPublic

Authored by nemanjai on Mar 16 2020, 11:08 AM.

Details

Reviewers
kcc
eugenis
dvyukov
samsonov
efriedma
Group Reviewers
Restricted Project
Summary

When running ninja check-all on a platform that includes sanitizer tests and the default linker is LLD, the machine can easily be overwhelmed if threads are not disabled in the linker. This is because LLD uses effectively unrestricted parallelism and llvm-lit will use as many processes as there are virtual CPUs on the system.

Of course, this issue can be worked around by specifying a lower number of threads to llvm-lit, but that has the undesired effect of reducing parallelism for the entire test run.

This patch therefore adds -Wl,--no-threads to the cflags for invocations of clang in the sanitizers.

Please keep in mind that this is not at all my area of expertise so if there is a better/simpler/cleaner way to accomplish this, please suggest one.

Diff Detail

Event Timeline

nemanjai created this revision.Mar 16 2020, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2020, 11:08 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Shouldn't -DLLVM_PARALLEL_LINK_JOBS=1 be used to avoid the described condition?

The -Wl,--no-threads logic seems test machine specific. Shouldn't a site-wise configuration file added somewhere on the test machine to adapt the behavior of lit substitutions? I don't know how to do that, though. lit.site.cfg.py files are generated in the build directory. They are not used for customization.

"overwhelmed" by what, exactly? I mean, I guess there's fundamentally some overhead if we end up with N^2 active threads (where N is the number of cores), but threads don't have that much memory overhead on their own; it shouldn't be high in absolute terms unless your test machine somehow has hundreds of cores. And I'm not sure what else could cause an issue; I guess if the kernel scheduler decided to do something weird, context switches might hurt performance?

Also, what OS are you using?

grimar added a subscriber: grimar.Mar 17 2020, 1:59 AM

"overwhelmed" by what, exactly? I mean, I guess there's fundamentally some overhead if we end up with N^2 active threads (where N is the number of cores), but threads don't have that much memory overhead on their own; it shouldn't be high in absolute terms unless your test machine somehow has hundreds of cores. And I'm not sure what else could cause an issue; I guess if the kernel scheduler decided to do something weird, context switches might hurt performance?

Also, what OS are you using?

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"

$ lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                192

The errors reported by the system is that it is unable to spawn any more threads because the resource is unavailable. This hardly seems unique to my machine. I imagine most machines with hyperthreading and large core counts will be in a similar situation.

BTW. I am not even convinced that the number of threads is n^2 in the number of CPU's. I didn't really dig into the implementation of the parallel support library, but the following comment in include/llvm/Support/Parallel.h certainly seems a bit concerning:

// TaskGroup has a relatively high overhead, so we want to reduce
// the number of spawn() calls. We'll create up to 1024 tasks here.
// (Note that 1024 is an arbitrary number. This code probably needs
// improving to take the number of available cores into account.)

Shouldn't -DLLVM_PARALLEL_LINK_JOBS=1 be used to avoid the described condition?

Nope. The issue is not that the build fails because of excessive parallelism when doing a wide build. The issue is when running ninja check-all (i.e. the lit tests). The default for lit is to run as many test cases at a time as there are virtual cpu's on the system. This is good because it gets through the very lightweight lit tests very quickly. And if the comment from Parallel.h I mentioned above is the number of threads lld will use, each instance of lld will spawn up to 1024 threads.

The -Wl,--no-threads logic seems test machine specific. Shouldn't a site-wise configuration file added somewhere on the test machine to adapt the behavior of lit substitutions? I don't know how to do that, though. lit.site.cfg.py files are generated in the build directory. They are not used for customization.

I don't really understand this comment. In what way is this machine-specific? If each instance of lld will spawn n threads and lit will fork off as many processes as there are CPU's reported by the system - call this number m, then there will be up to n * m threads spawned at the same time. On my machine (which I don't imagine is unique), m == 192 and if I am interpreting the comment in Parallel.h correctly, n == 1024 which makes the upper limit on any similar machine 196608 threads.

CPU(s): 192

I didn't realize we were at the point where machines like this are commonly available. I guess in that case, we end up with 40,000 threads, which is probably pushing the limit of what the machine can reasonably support. I mean, I think it should be possible to fit that many threads onto a computer with a reasonable amount of memory, but it's at the order of magnitude where the amount of memory each thread is using starts to matter. I think we might need to come up with some strategy to dynamically scale the number of threads each process is using for very parallel builds.

I mean, yes, we can do something like this patch for now, but users won't be happy when their builds start blowing up the same way.

I didn't really dig into the implementation of the parallel support library, but the following comment in include/llvm/Support/Parallel.h certainly seems a bit concerning:

The number of "tasks" is higher than the number of threads; the number of threads is controlled by the thread pool, and shouldn't exceed the number of cores, I think. You should be able to check this using "ps".

MaskRay added a comment.EditedMar 17 2020, 11:08 AM

The -Wl,--no-threads logic seems test machine specific. Shouldn't a site-wise configuration file added somewhere on the test machine to adapt the behavior of lit substitutions? I don't know how to do that, though. lit.site.cfg.py files are generated in the build directory. They are not used for customization.

I don't really understand this comment. In what way is this machine-specific? If each instance of lld will spawn n threads and lit will fork off as many processes as there are CPU's reported by the system - call this number m, then there will be up to n * m threads spawned at the same time. On my machine (which I don't imagine is unique), m == 192 and if I am interpreting the comment in Parallel.h correctly, n == 1024 which makes the upper limit on any similar machine 196608 threads.

This interpretation is not correct. 1024 is the block size in the parallel loop implementation. 1024 is not the number of threads.

The number of threads on Linux is llvm::hardware_concurrency. It is roughly:

if (sched_getaffinity(0, sizeof(Set), &Set))
    return CPU_COUNT(&Set);

For lld, I think --threads={0,1,2,...} is better than the current --(no-)threads. I can take a stab to implement it.

While I agree that the more general issue of scaling LLVM's (and particularly LLD's) parallelism is something we should consider tackling, I still think there is value in proceeding with something along the lines of this patch in the interim.
For users that have very parallel builds, there are mitigations available:

  • Presumably, most builds do not link such a large number of binaries that this is a serious problem
  • Even for builds that do link a large number of binaries, the cmake/ninja build system does provide the ability to scale down the number of parallel link jobs as someone mentioned in the comments above

However, the only recourse we currently have when it comes to running lit tests when LLD is the default linker is to reduce the number of processes available to lit using something like -DLLVM_LIT_ARGS=--threads=N for some N that won't cause such failures. This will have the undesired effect of reducing parallelism for tests that don't even use the linker (i.e. the vast majority of them). I suppose another workaround would be to test the sanitizers separately and pass in lit arguments accordingly, but that is even more inconvenient.

For lld, I think --threads={0,1,2,...} is better than the current --(no-)threads.

BTW, can't testing with --no-threads hide possible LLDs concurency bugs?
We probably do not want to test LLD with --(no-)threads in most cases because it is not a real world use case of LLD.

For lld, I think --threads={0,1,2,...} is better than the current --(no-)threads.

BTW, can't testing with --no-threads hide possible LLDs concurency bugs?
We probably do not want to test LLD with --(no-)threads in most cases because it is not a real world use case of LLD.

I agree that this reduces the overall use of LLD with threading enabled while testing. However, I wouldn't characterize this as reducing test coverage for LLD because:

  • This does not affect the actual LLD tests
  • I would imagine that having a sanitizer test fail intermittently due to a bug caused by LLD's concurrency would actually be unlikely to lead the developer into an investigation of LLD, but rather to a waste of time investigating the sanitizer

I assume there aren't a large number of developers that build with -DCLANG_DEFAULT_LINKER=lld. For those that do, I think there are plenty of opportunities to test LLD through using the built compiler so that the loss of testing within the highly parallel ninja check-all testing of sanitizers is not that significant.

Seems that people have lost interest in this review, but I still have the problem on my LLD builds. Any further comments from anyone?

Adding a flag like that requires a comment explaining the need each place you're adding it.

Is there some way to refactor this so the code isn't scattered across five Python files?

Adding a flag like that requires a comment explaining the need each place you're adding it.

Is there some way to refactor this so the code isn't scattered across five Python files?

I certainly don't mind adding the comments, that's a good point. I personally wasn't able to find a way to make the change in just one place - and I spent a bunch of time trying before I posted this. :(

nemanjai updated this revision to Diff 252829.Mar 26 2020, 6:59 AM

Added comments explaining why threads are disabled in LLD when it is the default linker.

aganea added a subscriber: aganea.Mar 26 2020, 7:21 AM
This revision is now accepted and ready to land.Mar 26 2020, 12:05 PM

In D76240#1927115, @MaskRay wrote:
For lld, I think --threads={0,1,2,...} is better than the current --(no-)threads.

I have a change to rename lld --no-threads to --threads=N. It takes more efforts to actually work because I need to patch llvm::parallel::for_each_n called by lld/include/lld/Common/Threads.h#L75

Can the lit test run a system lld which may not support --threads= after --no-threads is removed?

In D76240#1927115, @MaskRay wrote:
For lld, I think --threads={0,1,2,...} is better than the current --(no-)threads.

I have a change to rename lld --no-threads to --threads=N. It takes more efforts to actually work because I need to patch llvm::parallel::for_each_n called by lld/include/lld/Common/Threads.h#L75

Can the lit test run a system lld which may not support --threads= after --no-threads is removed?

Yes, they certainly can:

set(CLANG_DEFAULT_LINKER "" CACHE STRING
  "Default linker to use (linker name or absolute path, empty for platform default)")

However, this patch wouldn't make them pass --no-threads for such cases since the string in CLANG_DEFAULT_LINKER won't be exactly lld (but rather a path to some custom version of lld).

Also, thank you for working on the --threads=N option. I think that can be a great usability improvement for LLD. Once this lands, I think it would be useful to add a couple of CMake variables:
LIT_DEFAULT_LINKER_THREADS=N - pass the option -Wl,--threads=N to the linker flags in sanitizer tests (i.e. replace what we check for in this patch)
CLANG_DEFAULT_LINKER_THREADS=N - pass the above option whenever clang is invoking the linker (i.e. a global value for the Clang we are building)

MaskRay added a comment.EditedMar 30 2020, 3:22 PM

Heads-up: I will likely commit D76885 tomorrow if there is no more comments. How shall me handle the situation that a system lld may use --threads/--no-threads while an lld built from master uses --threads={all,1,2,3,...}?

BTW, I guess -Wl,--no-threads may not work on Windows. A system lld-link uses /threads:no.