Page MenuHomePhabricator

[clang][lex] Include tracking: simplify and move to preprocessor
Needs ReviewPublic

Authored by jansvoboda11 on Nov 17 2021, 8:20 AM.

Details

Summary

This patch replaces the exact include count of each file in HeaderFileInfo with a set of included files in Preprocessor.

The number of includes isn't a property of a header file but rather a preprocessor state. The exact number of includes is not used anywhere except statistic tracking.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Nov 17 2021, 8:20 AM
jansvoboda11 requested review of this revision.Nov 17 2021, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2021, 8:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I'm not sure this commit stands on its own, since it introduces a (temporary) regression in .pcm size; I suggest squashing with the follow-up, https://reviews.llvm.org/D114096.

clang/include/clang/Lex/HeaderSearch.h
55

Note that clang-format is complaining about a trailing space / here.

jansvoboda11 retitled this revision from [clang][lex][modules] Move number of includes from HeaderFileInfo to Preprocessor to [clang][lex] Include tracking: simplify and move to preprocessor.Nov 18 2021, 9:10 AM
jansvoboda11 edited the summary of this revision. (Show Details)

Haven't checked the details of serialization/deserialization. High-level question about different serialization approaches (bitvector vs list of file ids) is in D112915.

clang/lib/Lex/Preprocessor.cpp
552

Why do you need to getFileInfo but don't use it? I have no objections but it looks like it deserves a comment because it's not obvious.

vsapsai added inline comments.Nov 18 2021, 4:01 PM
clang/include/clang/Serialization/ASTBitCodes.h
700

Small detail. Do you need to add the new record to ASTWriter::WriteBlockInfoBlock? It's not critical but you are one of those who might be grateful later seeing PP_INCLUDED_FILES instead of UnknownCode66 in .pcm dump.

Add new record into block info.

jansvoboda11 marked an inline comment as done.Nov 19 2021, 12:48 AM
jansvoboda11 added inline comments.
clang/lib/Lex/Preprocessor.cpp
552

Without the call, I'm hitting some assertions when running C++20 modules tests:

Assertion failed: (CurDiagID == std::numeric_limits<unsigned>::max() && "Multiple diagnostics in flight at once!"), function Report, file Diagnostic.h, line 1526.
fatal error: error in backend: -verify directives found after rather than during normal parsing of <llvm>/clang/test/CXX/modules-ts/basic/basic.def.odr/p6/global-vs-module.cpp

Might need to investigate more to be able to write up a reasonable comment here.

vsapsai added inline comments.Nov 19 2021, 11:12 AM
clang/lib/Serialization/ASTWriter.cpp
874

I believe PP_INCLUDED_FILES is located in AST_BLOCK. Yes, you write it in ASTWriter::WritePreprocessor but after Stream.ExitBlock() together with MACRO_OFFSET. And then you read it in ASTReader::ReadASTBlock at the top level and not inside case PREPROCESSOR_BLOCK_ID or from ModuleFile::MacroCursor.

jansvoboda11 added inline comments.Nov 22 2021, 12:09 AM
clang/lib/Serialization/ASTWriter.cpp
874

That's right. This is modeled after PP_CONDITIONAL_STACK and PP_COUNTER_VALUE.

The problem is that the whole PREPROCESSOR_BLOCK_ID is treated as "macros only" block that's not being split into individual records right away and is instead deserialized lazily in ReadMacroRecord, ReadDefinedMacros, resolvePendingMacro. I think this should be fixed eventually, but I didn't want to expand the scope of my changes, since it's already somewhat complex already.

Move record ID from PREPROCESSOR_BLOCK to AST_BLOCK.

This LGTM by the way (not sure if that was already clear, since you abandoned a different review than I anticipated when squashing, and I'd "accepted" the other one); also see my suggestion to move the call to getFileInfo inside of markIncluded (might be error prone not to make that change).

Haven't clicked "accept" here in case @vsapsai has more comments.

clang/lib/Lex/Preprocessor.cpp
552

Not sure precisely why that assertion fires, but the call to getFileInfo makes sense since it's mutating -- it adds a HeaderFileInfo entry (and IncrementIncludeCount() used to call it). There are code paths that call getExistingFileInfo that will expect it to have been created on inclusion.

I suggest moving the getFileInfo() call to inside markIncluded() and adding a comment that says "Create the HeaderFileInfo if it doesn't already exist" or something. Parting of marking it included should be to ensure that callers of HeaderSearch::getExistingFileInfo get something back.

(This'd be more obvious (and the comment unnecessary) if getFileInfo were renamed to getOrCreateFileInfo, after which getExistingFileInfo could be renamed to getFileInfo.)

I've mentioned it in D112915 as we've discussed the stored data format there. But my concern was that bitvector packing might be not the most space-efficient encoding. I haven't done proper testing, just off-the-cuff comparison and it looks like for the most of frameworks in iOS SDK storing included headers per submodule takes less space than encoding them as a bitvector. I have an idea why that might be happening but I haven't checked it in debugger, so'll keep it to myself to avoid derailing the discussion.

I've mentioned it in D112915 as we've discussed the stored data format there. But my concern was that bitvector packing might be not the most space-efficient encoding. I haven't done proper testing, just off-the-cuff comparison and it looks like for the most of frameworks in iOS SDK storing included headers per submodule takes less space than encoding them as a bitvector. I have an idea why that might be happening but I haven't checked it in debugger, so'll keep it to myself to avoid derailing the discussion.

Let's bring the conversation over here. I ran the same UIKit test you did and compared the following:

  • current trunk
  • current trunk with this patch
  • current trunk with this patch, with bitvector replaced by vector of IDs (32-bit integers).

The following table shows sizes of .pcm files in bytes and their delta compared to trunk:

+----------+-----------------+-----------------+
|   trunk  |    bit vector   |    ID vector    |
+----------+-----------------+-----------------+
|   281932 |   281944    +12 |   281988    +56 |
|   989840 |   989784    -56 |   989968   +128 |
|   837116 |   837084    -32 |   837212    +96 |
|   899924 |   899912    -12 |   900004    +80 |
|   710296 |   710296     +0 |   710376    +80 |
|   273140 |   273144     +4 |   273196    +56 |
|  3649856 |  3649024   -832 |  3650804   +948 |
|   207676 |   207692    +16 |   207740    +64 |
|   342792 |   342804    +12 |   342860    +68 |
|  4137660 |  4137460   -200 |  4137940   +280 |
|   173536 |   173564    +28 |   173580    +44 |
|   787120 |   787144    +24 |   787180    +60 |
|  1260652 |  1260596    -56 |  1260804   +152 |
|   255072 |   255092    +20 |   255128    +56 |
|   973204 |   973228    +24 |   973268    +64 |
|   398952 |   398940    -12 |   399036    +84 |
|   631516 |   631516     +0 |   631588    +72 |
|  5252932 |  5252348   -584 |  5253612   +680 |
|   230160 |   230168     +8 |   230228    +68 |
|    24460 |    24500    +40 |    24500    +40 |
|    53244 |    53280    +36 |    53288    +44 |
|    75932 |    75952    +20 |    75972    +40 |
|    32840 |    32876    +36 |    32884    +44 |
+----------+-----------------+-----------------+
| 22479852 | 22478348  -1504 | 22483156  +3304 |
+----------+-----------------+-----------------+

Used command:

echo '#import <UIKit/UIKit.h>' | ./bin/clang -fsyntax-only -isysroot "$(xcrun --sdk iphoneos --show-sdk-path)" -target arm64-apple-ios -fmodules -fmodules-cache-path=modules.noindex -x objective-c -

Patch that I applied on top of the one under review to get vector of IDs:

I see how the bitvector could explode for large fine-grained modules. They have lots of input files (-> large bitvectors in each submodule), but each submodule only includes a handful files (-> bitvectors are sparse). It seems like this doesn't actually happen, at least in our SDK.

@vsapsai Do you think this warrants more thorough investigation?

Thanks for measuring the .pcm impact! And I appreciate including the trunk baseline.

The results aren't exactly the same I've seen before, so please forgive me my suspiciousness. When you mention

current trunk with this patch

do you mean "just this D114095 patch"? Or were you testing with D112915 applied? Because I was testing with per-submodule tracking applied. I am going to check again but it would be useful to know if you were comparing with per-submodule tracking (D112915) or not.

You're right, I measured only this patch, not per-submodule include tracking (D112915).

With per-submodule tracking, the results look like this:

+----------+------------------+------------------+-------------------+------------------+
| original |     ID vector    |    bit vector    | subm. w incl. [%] | 1 in bitvec. [%] |
+----------+------------------+------------------+-------------------+------------------+
|    23348 |    23380     +32 |    23380     +32 |       100.0       |       33.3       |
|    52188 |    52224     +36 |    52224     +36 |       100.0       |        6.3       |
|    74808 |    74856     +48 |    74840     +32 |       100.0       |        9.4       |
|   171772 |   171836     +64 |   171840     +68 |        40.0       |        5.7       |
|   206524 |   206584     +60 |   206540     +16 |       100.0       |       32.5       |
|   227716 |   227904    +188 |   227856    +140 |         6.3       |       33.3       |
|   253656 |   253812    +156 |   253788    +132 |        90.0       |        7.6       |
|   271332 |   271584    +252 |   271524    +192 |        85.7       |        8.5       |
|   280280 |   280460    +180 |   280428    +148 |        91.7       |        5.6       |
|   340024 |   340176    +152 |   340144    +120 |        21.4       |       10.9       |
|   394692 |   394928    +236 |   394872    +180 |        25.0       |       18.1       |
|   629740 |   630028    +288 |   629940    +200 |        83.3       |       20.0       |
|   707456 |   707732    +276 |   707676    +220 |        85.7       |       13.9       |
|   785508 |   785632    +124 |   785616    +108 |        85.7       |       18.1       |
|   835204 |   835824    +620 |   835616    +412 |        93.8       |       12.5       |
|   887764 |   888004    +240 |   887940    +176 |         8.7       |       19.1       |
|   971352 |   971504    +152 |   971500    +148 |        66.7       |        6.0       |
|   994112 |   995048    +936 |   994672    +560 |        93.5       |        9.1       |
|  1248888 |  1249408    +520 |  1249352    +464 |        29.6       |        5.0       |
|  3642908 |  3650076   +7168 |  3652668   +9760 |        74.5       |        2.8       |
|  4112848 |  4114016   +1168 |  4113780    +932 |        17.7       |        5.3       |
|  5213344 |  5216228   +2884 |  5216552   +3208 |        22.0       |        2.2       |
+----------+------------------+------------------+-------------------+------------------+
| 22325464 | 22341244  +15780 | 22342748  +17284 |                   |                  |
+----------+------------------+------------------+-------------------+------------------+

The subm. w incl. [%] column shows the percentage of submodules that include any headers and for the ones that do, 1 in bitvec. [%] shows how sparse are the bitvectors on average (what percentage of 1 bits they contain).

It seems like smaller modules are generally better off with bitvectors, but for larger modules with greater cumulative number of includes, the bitvectors get long and sparse. And it's the larger modules whose size ends up impacting the overall size of module cache. I think that matches my intuition and roughly corresponds to your own measurements.

(Note that in any case, the module cache growth is negligible: .071% for ID vector and .077% for bitvector.)

Given that, I think we should commit this patch with ID vectors, even though in isolation (without D112915) it's the worse solution. WDYT?

Given that, I think we should commit this patch with ID vectors, even though in isolation (without D112915) it's the worse solution. WDYT?

I have to admit that I think ID vector code is easier to understand and the format is easier to understand. And if bitvector approach is more complicated but doesn't gain us much in efficiency, then I think it's not worth pursuing. Though I'm glad and grateful that you've tried bitvector approach, that's very helpful.

My preference is to use ID vectors but I'm open to other opinions as I might be missing other trade-offs besides complexity and .pcm file size.

And I want to thank you for checking the sparseness of submodules and how with the bigger modules the impact of the sparseness becomes more pronounced.