Page MenuHomePhabricator

Use sched_getaffinity instead of std::thread::hardware_concurrency when available
ClosedPublic

Authored by rafael on Oct 2 2017, 3:58 PM.

Details

Reviewers
ruiu
Summary

The issue with std::thread::hardware_concurrency is that it forwards to libc and some implementations (like glibc) don't take thread affinity into consideration.

With this change a llvm program that can execute in only 2 cores will use 2 threads, even if the machine has 32 cores.

This makes benchmarking a lot easier, but should also help if someone doesn't want to use all cores for compilation for example.

Diff Detail

Event Timeline

rafael created this revision.Oct 2 2017, 3:58 PM
smeenai added a subscriber: smeenai.Oct 2 2017, 4:05 PM
smeenai added inline comments.
include/llvm/Support/Threading.h
135

Typo: nreturns

ruiu edited edge metadata.Oct 2 2017, 4:11 PM

Thank you for doing this.

Off-topic: there seems to be many things we can do utilize multiple cores more efficiently. One thing I want to fix is the large variance of lld's execution time. Currently, lld sometimes takes unreasonably long time to finish even though that rarely happens. I'd guess the kernel does a bad job at scheduling threads. I wonder if we could do something -- e.g. pinning each thread to an available core.

include/llvm/Support/Threading.h
140

It sounds like it returns a list of std::thread objects or something. How about hardware_concurrency? We have heavyweight_hardware_concurrency function just above this declaration, and compared to that name, I don't think that's a bad name.

rafael updated this revision to Diff 117456.Oct 2 2017, 5:40 PM

Renamed. Fixed typo.

ruiu added inline comments.Oct 2 2017, 5:45 PM
include/llvm/Support/ThreadPool.h
42

hardware_concurrency()

lib/Support/Threading.cpp
82

AFAIK std::thread::hardware_concurrency() is allowed to return 0 if it doesn't know anything about hardware concurrency. I don't know if there's actually such system that the function could return 0, but it is better to not rely on an undefined behavior. I'd explicitly check if std::thread::hardware_concurrency returns 0 and if that's the case, return 1.

rafael updated this revision to Diff 117458.Oct 2 2017, 6:03 PM

Rename the missing places. Guard against std::thread::hardware_concurrency returning 0.

ruiu accepted this revision.Oct 2 2017, 6:07 PM

LGTM

lib/Fuzzer/FuzzerUtil.cpp
199–204

Since you are using your own hardware_concurrency, you don't need to guard against 0, no?

tools/llvm-profdata/llvm-profdata.cpp
214–215

You can remove std::max because std::min(hardware_concurrency(), unsigned(Inputs.size() / 2)) can now never return 0.

This revision is now accepted and ready to land.Oct 2 2017, 6:07 PM
timshen added a subscriber: timshen.Oct 3 2017, 6:13 PM

Hi,

I reverted the change to FuzzerUtil.cpp, as it doesn't seem to depend on LLVM.