A simple optimization to avoid unnecessary copies of BoundNodesTreeBuilders when querying the cache for memoized matchers.
This hasn't been benchmarked as I'm unsure how the memoized matchers were first benchmarked so it shouldn't be merged unless there is a performance improvement.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/ASTMatchers/ASTMatchFinder.cpp | ||
---|---|---|
84 | note that making fields const tends to disable various move optimisations, so better to mark the whole struct const in code that uses it |
Given the extra complexity I'd like to see that it matters - bound nodes tend to be small.
I put that in the description, but this is where i need help. whats the best way to benchmark the matchers?
Also, do you know how it was benchmarked when MaxMemoizationEntries was decided upon?
There was also comments about some making some micro benchmarks but I don't think that was acted upon.
I do remember benchmarking it when I wrote it, but that was 10 years ago :)
I mainly did benchmarks on large ASTs, trying various matchers.
Today we can benchmark for example whether running clang-tidy on all of LLVM makes a difference.
Micro-BMs would be nice to add, but are somewhat hard to get right.
Running most of the clang tidy checks on the clang-tidy folder yields these results
=================================BeforePatch=================================== RUN1: 4045.17user 83.93system 11:28.80elapsed 599%CPU (0avgtext+0avgdata 534024maxresident)k 0inputs+0outputs (0major+27584683minor)pagefaults 0swaps RUN2: 4078.06user 84.99system 11:35.99elapsed 598%CPU (0avgtext+0avgdata 506912maxresident)k 55312inputs+0outputs (663major+27661947minor)pagefaults 0swaps RUN3: 4040.77user 86.02system 11:28.85elapsed 599%CPU (0avgtext+0avgdata 547096maxresident)k 0inputs+0outputs (0major+27698937minor)pagefaults 0swaps
==================================AfterPatch=================================== RUN1: 4025.33user 83.32system 11:27.00elapsed 598%CPU (0avgtext+0avgdata 530568maxresident)k 0inputs+0outputs (0major+27689512minor)pagefaults 0swaps RUN2: 4056.93user 83.36system 11:32.31elapsed 598%CPU (0avgtext+0avgdata 529120maxresident)k 3752inputs+0outputs (19major+27794845minor)pagefaults 0swaps RUN3: 4029.05user 85.45system 11:26.31elapsed 599%CPU (0avgtext+0avgdata 533508maxresident)k 0inputs+0outputs (0major+27730918minor)pagefaults 0swaps
Shows a consistent improvement but there tests are very noisy and dont focus on just the matching, they also include all the other boilderplate when running clang-tidy over a database. not to mention a small sample size
I decided to do some more stringent benchmarking, just focusing on the matching, not the boilerplate of running clang-tidy.
=================================BeforePatch=================================== Matcher Timings: 116.0756 user 29.1397 system 145.2154 process 145.2168 wall Matcher Timings: 117.7205 user 29.2475 system 146.9681 process 146.9158 wall Matcher Timings: 116.8313 user 29.5170 system 146.3483 process 146.2655 wall Matcher Timings: 117.9491 user 29.0969 system 147.0459 process 146.9678 wall Matcher Timings: 117.6309 user 29.1864 system 146.8173 process 146.7687 wall user: 117.2+-0.753 system: 29.24+-0.166 process: 146.5+-0.760
==================================AfterPatch=================================== Matcher Timings: 110.5497 user 28.3316 system 138.8813 process 138.7960 wall Matcher Timings: 112.5151 user 28.8616 system 141.3767 process 141.3003 wall Matcher Timings: 116.1578 user 28.9472 system 145.1049 process 145.0785 wall Matcher Timings: 107.1089 user 27.2752 system 134.3841 process 134.3459 wall Matcher Timings: 105.9242 user 27.0338 system 132.9580 process 132.9010 wall user: 110.4+-4.159 system: 28.09+-0.890 process: 138.6+-4.979
This is showing ~5% improvement when running every clang-tidy check on a translation unit (Specifically clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp)
+Sam for additional sanity checking.
clang/lib/ASTMatchers/ASTMatchFinder.cpp | ||
---|---|---|
904 | Ok, if this actually matters, we should also not use a std::map here, but a llvm::DenseMap (or do we rely on iteration invalidation semantics here?). |
clang/lib/ASTMatchers/ASTMatchFinder.cpp | ||
---|---|---|
904 | Belated +1. 5% seems like this performance *does* matter, so DenseMap::find_as should be yet faster. |
note that making fields const tends to disable various move optimisations, so better to mark the whole struct const in code that uses it