Summary: Keep history of macro definitions and #undefs with corresponding source locations, so that we can later find out all macros active in a specified source location. We don't save the history in PCH (no need currently). Memory overhead is about sizeof(void*)*3*<number of macro definitions and #undefs>+<in-memory size of all #undef'd macros>
I've run a test on a file composed of 109 .h files from boost 1.49 on x86-64 linux.
Stats before this patch:
- Preprocessor Stats:
73222 directives found:
19171 #define. 4345 #undef. #include/#include_next/#import: 5233 source files entered. 27 max include stack depth 19210 #if/#ifndef/#ifdef. 2384 #else/#elif. 6891 #endif. 408 #pragma.
14466 #if/#ifndef#ifdef regions skipped
80023/451669/1270 obj/fn/builtin macros expanded, 85724 on the fast path.
127145 token paste (##) operations performed, 11008 on the fast path.
Preprocessor Memory: 5874615B total
BumpPtr: 4399104 Macro Expanded Tokens: 417768 Predefines Buffer: 8135 Macros: 1048576 #pragma push_macro Info: 0 Poison Reasons: 1024 Comment Handlers: 8
Stats with this patch:
...
Preprocessor Memory: 7541687B total
BumpPtr: 6066176 Macro Expanded Tokens: 417768 Predefines Buffer: 8135 Macros: 1048576 #pragma push_macro Info: 0 Poison Reasons: 1024 Comment Handlers: 8
In my test increase in memory usage is about 1.7Mb, which is ~28% of initial preprocessor's memory usage and about 0.8% of clang's total VMM allocation.
As for CPU overhead, it should only be noticeable when iterating over all macros, and should mostly consist of couple extra dereferences and one comparison per macro + skipping of #undef'd macros. It's less trivial to measure, though, as the preprocessor consumes a very small fraction of compilation time.
Why do we need the definition location in the MacroInfoList? For #defines, the location is already in the MacroInfo. For #undefs, it would seem to be more valuable to simply model the #undef as a
within the MacroInfo itself. An active macro will have an invalid UndefLocation.