This is an archive of the discontinued LLVM Phabricator instance.

[NFC] Increase initial size of FoldingSets used in ASTContext and CodeGenTypes
ClosedPublic

Authored by yurai007 on Jan 31 2022, 7:39 AM.

Details

Summary

Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:

  1. Default FoldingSet capacity for 2^6 items very often is not enough. For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items. FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.
  1. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator). What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache (as items with same hash are organized in intrusive linked list which need to be traversed).

This change address both issues by increasing initial size of FoldingSets used in ASTContext and CodeGenTypes.

Extracted from: https://reviews.llvm.org/D118385

Diff Detail

Event Timeline

yurai007 requested review of this revision.Jan 31 2022, 7:39 AM
yurai007 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 7:39 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
yurai007 edited the summary of this revision. (Show Details)Jan 31 2022, 7:40 AM

Thanks, that's helpful. Any hint about why this particular values?

yurai007 added a comment.EditedFeb 3 2022, 3:31 AM

Thanks, that's helpful. Any hint about why this particular values?

Sure, let me elaborate on this. Initial capacities were empirically chosen to reduce number of internal hashtable growths (rehashings) in FoldingSet.
FoldingSet has initial size set to 6 by default which corresponds to 2^6 buckets in internal hashtable. In most cases it's ok and doesn't really matter.
However while building huge codebase (like Linux) I find PointerTypes/ElaboratedTypes/ParenTypes being hot and growing very often to 256 (2^8) or 512 buckets (2^9).
That would explains choosing GeneralTypesLog2InitSize and ConstantArrayTypesLog2InitSize as more appropriate initial values.
For FunctionProtoTypes initial 2^6 capacity is way too conservative as its ContextualFoldingSet easly grows to 2^10/2^11 buckets.
It's not only about "empirical proof". For example from frontend perspective FunctionProtoTypes keeps all found function prototypes in compiling module.
Since it's reasonable to assume that average non-trivial module has rather hundreds or thousands functions than 64 using 2^12 as initial capacity should be enough.
Also it's worth to mention that with such approach we don't pay much in terms of memory footprint because FoldingSet internally in hashtable stores just pointers.

ChuanqiXu accepted this revision.Feb 7 2022, 6:09 PM

LGTM.

This revision is now accepted and ready to land.Feb 7 2022, 6:09 PM
This revision was landed with ongoing or failed builds.Feb 8 2022, 8:54 AM
This revision was automatically updated to reflect the committed changes.