No need to roll our own implementation for something that's covered by llvm.
Details
- Reviewers
labath kwk - Group Reviewers
Restricted Project - Commits
- rGe57361c055d7: [lldb/Host] Remove TaskPool and replace its uses with llvm::ThreadPool
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
While getting rid of this would be great, this (the manual dwarf index) is also very performance sensitive code which has been optimized and re-optimized a number of times. So, I would want to see a benchmark to ensure this does not affect performance adversely. A good benchmark would be to take a release (without asserts) build of lldb and debug build of clang (without accelerator tables) and then run something like time opt/lldb dbg/clang -o "br set -n nonexisting_function" -b a couple of times. The test should be run on an elf binary because the mac debug info model is sufficiently different to skew the results, even if you do manage to disable the accelerator tables.
lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp | ||
---|---|---|
24–30 | This *is* simpler than the previous implementation, but it may have different performance characteristics. The previous implementation was optimized for small workloads where the overhead of enqueuing a task was not negligible compared to the size of the job itself. | |
86–112 | All of this should use the same ThreadPool instance to avoid re-creating the pool threads a couple of times. |
All tests pass. I first thought that the lldb/test/Shell/Reproducer/TestGDBRemoteRepro.test test didn't work but it seems to be flaky.
lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp | ||
---|---|---|
86–88 | @JDevlieghere I assume you've removed the first of the three parameters to TaskMapOverInt because it was always 0 anyways? If not, shouldn't this be easily changeable? static void TaskMapOverInt(size_t idx, size_t end, const llvm::function_ref<void(size_t)> &func) { llvm::ThreadPool pool; for (; idx < end; idx++) pool.async(func, idx); pool.wait(); } |
lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp | ||
---|---|---|
24–30 | (I'm not saying this should be changed back. llvm's thread pool creates threads differently, and that is likely to have been the main cause of overhead, so this optimization may not matter anymore. However, it would be good to verify that.) |
Below are the results for the situation you suggested: release (no-asserts) lldb, debug clang (no debug names). Results were collected in a Ubuntu 18.04 docker image running on CentOS.
$ for i in {1..10}; do time ./build-ra/bin/lldb ./build-debug/bin/clang -o "br set -n nonexisting_function" -b > /dev/null ;done ----------------- before ----------------- real 0m11.613s real 0m11.527s real 0m11.547s real 0m11.677s real 0m11.614s real 0m11.627s real 0m11.882s real 0m11.468s real 0m11.682s real 0m11.562s ----------------- avg 0m11.620s ----------------- after ----------------- real 0m11.887s real 0m11.614s real 0m11.812s real 0m11.723s real 0m11.605s real 0m12.080s real 0m11.673s real 0m11.751s real 0m11.660s real 0m11.647s ----------------- avg 0m11.745s
Thanks. The observed difference appears to be (borderline) statistically significant, but the difference (~1%) is not very big, so I think that's worth reusing the llvm class (and any future performance improvements). I think the difference will go down even more if you avoid recreating the thread pool three times.
Cool. I'm not sure how useful is the TaskMapOverInt function in this new world (if I was writing this from scratch, I don't think I would create a function for that), but it doesn't hurt that much either.
This *is* simpler than the previous implementation, but it may have different performance characteristics. The previous implementation was optimized for small workloads where the overhead of enqueuing a task was not negligible compared to the size of the job itself.