This is an archive of the discontinued LLVM Phabricator instance.

[llvm-cov] Prevent llvm-cov from using too many threads
ClosedPublic

Authored by aganea on Apr 17 2020, 5:33 PM.

Details

Summary

As requested here: https://reviews.llvm.org/D75153#1987272

Before, each instance of llvm-cov was creating one thread per hardware core, which wasn't needed probably because the number of inputs were small. This was probably causing a thread rlimit issue on large core count systems.

After this patch, the previous behavior is restored (to what was before rG8404aeb5):

  • If --num-threads is not specified, we create one thread per input, up to num.cores.
  • When specified, --num-threads indicates any number of threads, with no upper limit.

Petr, could you please confirm that this patch fixes the issue?

Diff Detail

Event Timeline

aganea created this revision.Apr 17 2020, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2020, 5:33 PM

I think we should respect ThreadsRequested if it is greater than 0.

For the particular llvm-cov issue, we can probably restore the previous behavior:

NumThreads =
    std::max(1U, std::min(llvm::heavyweight_hardware_concurrency(),
                          unsigned(SourceFiles.size())));
NumThreads = SourceFiles.size();
aganea updated this revision to Diff 258765.Apr 20 2020, 9:16 AM
aganea edited the summary of this revision. (Show Details)

I think we should respect ThreadsRequested if it is greater than 0.

Nothing is changed in that regard. Only when .Limit is set.

For the particular llvm-cov issue, we can probably restore the previous behavior:

I cannot restore the code exactly how it was before because of the reasons mentioned in rG8404aeb5.
I rewrote it in a way which is simpler to understand. I also fixed the other places that suffered from the same issue.

A better fix could be to lazily create threads in the ThreadPool when jobs are pushed through async(). This issue here is that we are currently creating too many threads by default, whereas before we were limiting to one thread per input, given a small num.inputs .

aganea updated this revision to Diff 258773.Apr 20 2020, 9:29 AM
aganea edited the summary of this revision. (Show Details)

Typos.

MaskRay added inline comments.Apr 20 2020, 10:11 AM
llvm/lib/Support/Threading.cpp
88

Note that sys::getHostNumPhysicalCores can return -1 in some cases (on some OS/arch). In such cases we will return 1 if ThreadsRequested == 0.

I still think that if the user asks for more threads than the CPU supports, we should respect that. For one thing, not every task can fully leverage 100% of the computing power of a core.

aganea marked 4 inline comments as done.Apr 20 2020, 1:23 PM
aganea added inline comments.
llvm/lib/Support/Threading.cpp
88

Sorry I'm a bit slow. I completely agree with "if the user asks for more threads than the CPU supports, we should respect that", but I don't understand in which case this isn't satisfied as it stands?

ThreadsRequested == 0 means 'use the default', which is MaxThreadCount. Should we return std::thread::hardware_concurrency() if MaxThreadCount == 1?

If the user asks anything > 0, currently we will satisfy that requirement. Except when we want don't want to automatically spawn more than MaxThreadCount (thus the new Limit variable). Note, Limit = false by default.

llvm/tools/llvm-cov/CodeCoverage.cpp
953

I can also do S.ThreadsRequested = std::min(SourceFiles.size(), heavyweight_hardware_concurreny().compute_thread_count()); here instead of the .Limit logic. But then the code becomes more difficult to read I find.

The dsymutil part looks fine, I don't expect there to be many cases where we have more architectures than cores :-)

aganea marked an inline comment as done.Apr 22 2020, 5:59 AM

Ping! This seemed to be a blocker for @phosek.

phosek accepted this revision.Apr 22 2020, 9:29 AM

This LGTM but I haven't yet been able to verify this on our side, but that's because getting a toolchain with a custom patch used by our coverage infrastructure is something we don't support right now and I still haven't managed to get that working (the issue only reproduces on our bots, I haven't managed to reproduce that issue locally on my workstation). I'd be fine landing this and then follow up if there are any issues once we roll this in.

This revision is now accepted and ready to land.Apr 22 2020, 9:29 AM

[...] and I still haven't managed to get that working (the issue only reproduces on our bots, I haven't managed to reproduce that issue locally on my workstation)

The issue is clearly related to the number of cores in the system, the amount of instances of LLVM exes that are running at once, and rlimits. Does your workstation have the same hardware config as the bots?

MaskRay accepted this revision.EditedApr 22 2020, 4:27 PM

LGTM. I was waiting on people's response whether the addition bool Limit = false (it increased the complexity of the interface a bit) is necessary. It seems that people are fine with it.

[...] and I still haven't managed to get that working (the issue only reproduces on our bots, I haven't managed to reproduce that issue locally on my workstation)

The issue is clearly related to the number of cores in the system, the amount of instances of LLVM exes that are running at once, and rlimits. Does your workstation have the same hardware config as the bots?

I have finally manage to do a full run on our bots and I can confirm that this addresses the issue.

Can we land this?

This revision was automatically updated to reflect the committed changes.