This is an archive of the discontinued LLVM Phabricator instance.

[Preprocessor] Reduce the memory overhead of `#define` directives

Authored by arphaman on Jan 14 2022, 11:14 AM.



Recently we observed high memory pressure caused by clang during some parallel builds. We discovered that we have several projects that have a large number of #define directives in their TUs (on the order of millions), which caused huge memory consumption in clang due to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of clang for a single #define to reduce the memory overhead for these files, to allow us to reduce the memory pressure on the system during highly parallel builds. This change achieves that by removing the SmallVector in MacroInfo and instead storing the tokens in an array allocated using the bump pointer allocator, after all tokens are lexed.

The added unit test with 1000000 #define directives illustrates the problem. Prior to this change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator allocates 120002016 bytes, and uses only roughly 120 bytes per #define.

For an example test file that we have internally with 7.8 million #define directives, this change produces the following improvement on arm64 macOS: Persistent allocation footprint for this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change reduced the total number of allocations made by the system for this clang invocation from 1454853 to 133663, an order of magnitude improvement.

Diff Detail

Event Timeline

arphaman created this revision.Jan 14 2022, 11:14 AM
arphaman requested review of this revision.Jan 14 2022, 11:14 AM
This revision is now accepted and ready to land.Jan 14 2022, 2:41 PM

Just some minor nits from me, but generally LG.


I think this should be a const_tokens_iterator instead (and it's fine that we don't expose a non-const interface for the iterator).


Should we assert that we've not already allocated tokens before?


Should we do this dance for 32-bit systems as well?


Please spell out the type.

dexonsmith added inline comments.

Do I remember correctly that SourceLocation's size recently became configurable? Or maybe it will be soon? Should that be factored in somehow?

aaron.ballman added inline comments.Jan 19 2022, 7:25 AM

Are you thinking about this review or something else?

dexonsmith added inline comments.Jan 19 2022, 1:35 PM

Yes, I think that's the one.

aaron.ballman added inline comments.Jan 20 2022, 8:16 AM

Yeah, it's probably not a bad idea to use sizeof(SourceLocation) instead of calculating the size manually for that bit.

Thanks, that feedback makes sense. I'll update the patch today.

arphaman updated this revision to Diff 405103.Feb 1 2022, 2:35 PM
arphaman marked 4 inline comments as done.

Update to address review feedback, remove appendToken which is not needed as we're can just setTokens instead (it's a new macro info)


Good idea. Done.

aaron.ballman accepted this revision.Feb 10 2022, 7:16 AM

Thanks, this LGTM as well! I don't think the precommit CI pipeline failures are related from what I can tell.

This revision was landed with ongoing or failed builds.Feb 11 2022, 3:01 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2022, 3:01 PM
thakis added a subscriber: thakis.Feb 11 2022, 3:51 PM

Very cool! Looks like it broke lldb builds though:

Yep, I just noticed. Reverting for now and will fix LLDB before recommitting.



bdf573652138..3f05192c4c40  main -> main