This is an archive of the discontinued LLVM Phabricator instance.

[NFC] Optimize FoldingSet usage where it matters
AbandonedPublic

Authored by yurai007 on Jan 27 2022, 9:03 AM.

Details

Summary

While building huge code bases it's not uncommon to see perf reports with following FoldingSet items:

1.56%     0.47%  clang  clang-14  [.] llvm::FoldingSetBase::FindNodeOrInsertPos
0.30%     0.01%  clang  clang-14  [.] llvm::ContextualFoldingSet<clang::FunctionProtoType, clang::ASTContext&>::NodeEquals
0.25%     0.02%  clang  clang-14  [.] llvm::FoldingSetBase::InsertNode
0.23%     0.12%  clang  clang-14  [.] llvm::FoldingSetBase::GrowBucketCount
0.22%     0.21%  clang  clang-14  [.] llvm::FoldingSetNodeID::AddPointer
0.47%     0.06%  clang  clang-14  [.] llvm::FoldingSetBase::InsertNode

or

1.12%     0.75%  clang++       libLLVM-13.so        [.] llvm::FoldingSetBase::GrowBucketCount
0.49%     0.48%  clang++       libLLVM-13.so        [.] llvm::FoldingSetNodeID::AddPointer
0.41%     0.09%  clang++       libLLVM-13.so        [.] llvm::FoldingSetNodeID::operator==

etc.

Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:

  1. Default FoldingSet capacity for 2^6 items very often is not enough. For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items. FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.
  1. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator) What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache (as items with same hash are organized in intrusive linked list which need to be traversed).
  1. Lack of AddInteger/AddPointer and computeHash inlining slows down NodeEquals/Profile/:operator== calls. Inlining makes FunctionProtoTypes/PointerTypes/ElaboratedTypes/ParenTypes Profile functions faster but since NodeEquals is still called indirectly through function pointer from FindNodeOrInsertPos there is room for further inlining improvements.

After addressing above issues I built Linux (with default config) on isolated CPU cores in silent x86-64 Linux environment.
Compile time statistics diff produced by perf before and after change are following:
instructions -0.4%, cycles -0.9%
size-text change of output Clang binary is below +0.1%.

Similarly like in: https://reviews.llvm.org/D118169 for code bases containing smaller translation units
it's expected to get less significant speedup with this patch.

Diff Detail

Event Timeline

yurai007 created this revision.Jan 27 2022, 9:03 AM
yurai007 requested review of this revision.Jan 27 2022, 9:03 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 27 2022, 9:03 AM
yurai007 edited the summary of this revision. (Show Details)Jan 27 2022, 9:04 AM
yurai007 edited the summary of this revision. (Show Details)
yurai007 edited the summary of this revision. (Show Details)Jan 27 2022, 9:07 AM
clang/include/clang/AST/ASTContext.h
214

It's probably good to give that value a meaningful ( and constexpr) variable name as it's used at several point.

clang/lib/AST/ASTContext.cpp
976–977

same here, that's a lot of magic values :-)

clang/lib/CodeGen/CodeGenTypes.h
79

And here

llvm/include/llvm/ADT/FoldingSet.h
328–354

Concerning that inlined part, I expect LTO to close the gap instead of moving everything to headers. Do we have a policy on that topic?

yurai007 updated this revision to Diff 403951.Jan 28 2022, 3:25 AM
yurai007 marked 3 inline comments as done.
yurai007 added inline comments.
llvm/include/llvm/ADT/FoldingSet.h
328–354

I'm not aware of any LLVM Coding Guidlines policy, probably most related is just general rule: https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible I agree LTO is great when enabled. I just tried to move only small part which matters.
Could you please elaborate what are you concerning about?

The only potential risks which come to my mind right now are:

  • increased binary size, however I noticed Clang binary grows only below +0.1% which is acceptable I think.
  • moving to header part of implementation which is often changed, however AddPointer/AddInteger/ComputeHash were touched last time in 2012.
  • compile time impact on Clang build time. I confess I didn't compare Clang build times before and after change, but if you like I can.
  • reduced I-cache hit rate. This is something also I didn't check under perf but I can if you like (not sure how important it is given that we get drops of other metrics).
xbolva00 added inline comments.Jan 28 2022, 3:34 AM
llvm/include/llvm/ADT/FoldingSet.h
328–354

Is LLVM / Clang in distro releases even built with LTO / LTO + PGO?

llvm/include/llvm/ADT/FoldingSet.h
328–354

In fedora, LLVM is compiled with LTO. But I agree that's not an assumption we should make, and thus moving some functions to the header looks ok.

yurai007 added inline comments.Jan 29 2022, 2:12 AM
llvm/include/llvm/ADT/FoldingSet.h
328–354

Yep, recently there is growing interest in using LTO among distros. I heard about at least Fedora, Ubuntu, Red Hat, Debian and Arch interested in enabling it by default. Except Fedora not sure how it looks like with Clang package, though.

nikic added a comment.Jan 29 2022, 3:16 AM

It might make sense to split this into individual changes, so it's clearer what impact each of them has.

I tested just moving the AddXYZ methods into the header, which had a large positive impact: https://llvm-compile-time-tracker.com/compare.php?from=784e01abca65722df8969b56d2d240cf9ced9c85&to=179ee195b8ce9f483827f843fc063388aed7f0d1&stat=instructions

Moving hashing into the header has smaller impact: https://llvm-compile-time-tracker.com/compare.php?from=179ee195b8ce9f483827f843fc063388aed7f0d1&to=5735a8981d5cf00281490989d02d7771b95cda51&stat=instructions

It might make sense to split this into individual changes, so it's clearer what impact each of them has.

I tested just moving the AddXYZ methods into the header, which had a large positive impact: https://llvm-compile-time-tracker.com/compare.php?from=784e01abca65722df8969b56d2d240cf9ced9c85&to=179ee195b8ce9f483827f843fc063388aed7f0d1&stat=instructions

Moving hashing into the header has smaller impact: https://llvm-compile-time-tracker.com/compare.php?from=179ee195b8ce9f483827f843fc063388aed7f0d1&to=5735a8981d5cf00281490989d02d7771b95cda51&stat=instructions

Thanks for extra measurements. Yes, splitting it make sense. I will do it.

yurai007 abandoned this revision.Feb 3 2022, 8:02 AM