This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Lex/
2/2
ExternalPreprocessorSource.h
4/5
Preprocessor.h
-
Serialization/
-
ASTBitCodes.h
-
ASTReader.h
-
ASTWriter.h
4/4
ModuleFile.h
-
lib/
-
Basic/
1/1
Module.cpp
-
Lex/
1/1
PPLexerChange.cpp
6/7
Preprocessor.cpp
-
Serialization/
1/2
ASTReader.cpp
6/6
ASTWriter.cpp
-
test/Modules/
-
Modules/
-
import-submodule-visibility.c

Differential D112915

[clang][modules] Track included files per submodule
Needs ReviewPublic

Authored by jansvoboda11 on Nov 1 2021, 2:23 AM.

Download Raw Diff

Details

Reviewers

Bigcheese
dexonsmith
vsapsai
rsmith

Summary

When building a module consisting of submodules, the preprocessor keeps a "global" state that for each header file tracks (amongst other things) the number of times it was included/imported. This information is serialized into the PCM file.

When importing anything from such module (either the top-level module or any of its submodules), this number is merged into the state of the importing preprocessor.

This can incorrectly prevent imports of textual headers (see attached tests). This patch fixes this bug by making the number of times a header was included more fine-grained and tracks it on per-submodule basis. This information is lazily deserialized when first importing each (sub)module.

(This patch is an alternative approach to the same issue addressed in D104344.)

Depends on D114095.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jansvoboda11 requested review of this revision.Nov 1 2021, 2:23 AM

jansvoboda11 created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptNov 1 2021, 2:23 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

jansvoboda11 edited the summary of this revision. (Show Details)Nov 1 2021, 2:23 AM

jansvoboda11 added a project: Restricted Project.

I'm interested in hearing some feedback whether the direction I'm taking here makes sense.

There are a couple of TODOs (mostly on optimizing away unnecessary maps) and a few modules-ts tests are failing.

@rsmith left a suggestion on D104344 to track this information in Preprocessor::SubmoduleState, which has similar semantics to what I'm doing with Preprocessor::IncludeMap (dividing the state during submodule compilation). The issue is that it (Preprocessor::Submodules) is really only enabled when Clang gets invoked with -fmodules-local-submodule-visibility. To keep its current semantics and make it usable for our use-case (which needs to work even without the flag), I think we'd need to always track the submodule state and conditionally merge the outer and local macro states as we now do for IncludeMap in PPLexerChange.cpp. I'm not really sure this is correct/feasible.

Any opinions on this?

jansvoboda11 edited the summary of this revision. (Show Details)Nov 1 2021, 2:54 AM

jansvoboda11 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B131704: Diff 383750.Nov 1 2021, 3:08 AM

I'm not going to cover the entire change, some parts I need to consider more carefully.

There can be other reasons to keep IncludeMap out of SubmoduleState but I'm not sure the local submodule visibility is the right reason. I might be reading the code incorrectly but looks like CurSubmoduleState is used when local submodule visibility is disabled. The difference is it's tracking the aggregate state instead of per-submodule state. Need to experiment more but so far tracking includes in SubmoduleState follows the spirit of local submodule visibility. Though it's not guaranteed it'll work perfectly from the technical perspective.

Also I think we'll need to increase granularity to track other HeaderFileInfo attributes, not just NumIncludes. Don't have a test case to illustrate that right now and no need to change that now but something to keep in mind.

clang/lib/Lex/PPLexerChange.cpp
706	How many includes are expected to be here? Are this only immediate includes or also transitive? Asking to evaluate how expensive iterating through the includes can get.

Avoid copying data between submodules

In D112915#3104873, @vsapsai wrote:

There can be other reasons to keep IncludeMap out of SubmoduleState but I'm not sure the local submodule visibility is the right reason. I might be reading the code incorrectly but looks like CurSubmoduleState is used when local submodule visibility is disabled. The difference is it's tracking the aggregate state instead of per-submodule state. Need to experiment more but so far tracking includes in SubmoduleState follows the spirit of local submodule visibility. Though it's not guaranteed it'll work perfectly from the technical perspective.

Yes, CurSubmoduleState is being used unconditionally. However, without local submodule visibility enabled, it always points to NullSubmoduleState. Only with the feature enabled does it point to the current submodule state (stored in Submodules). The change happens in Preprocessor::{Enter,Leave}Submodule.

Also I think we'll need to increase granularity to track other HeaderFileInfo attributes, not just NumIncludes. Don't have a test case to illustrate that right now and no need to change that now but something to keep in mind.

That's interesting. I think HeaderFileInfo::isImport should definitely be tracked in the preprocessor, not in HeaderFileInfo. The fact that the header was #imported is not an intrinsic property of the file itself, but rather a preprocessor state. Can you think of other fields that don't really belong to HeaderFileInfo?

Based on your feedback, I simplified the patch quite a bit. We're no longer copying the include state between submodules. In its current form, this patch essentially moves HeaderFileInfo::NumIncludes into Preprocessor::NumIncludes and still uses it as the source of truth.
However, we're also tracking NumIncludes separately in each submodule and serializing this into the PCM. Instead of merging NumIncludes of the whole module when loading it (which we did before), we can merge NumIncludes only of the modules we actually import.

clang/lib/Lex/Preprocessor.cpp
1330	Iterating over all FileEntries is probably not very efficient, as Volodymyr mentioned. Thinking about how to make this more efficient...

Harbormaster completed remote builds in B132250: Diff 384484.Nov 3 2021, 10:05 AM

Just a few comments on implementation details. The only high-level piece to call out is that I wonder if NumIncludes could/should be simplified (semantically) to a Boolean in a prep commit.

clang/include/clang/Lex/Preprocessor.h
722–725	I'm not sure about this choice. UIDs are unlikely to be adjacent and in SubmoduleIncludeState, since a given submodule is unlikely to have "most" files. Also, memory usage characteristics could be "weird" if the FileManager is being reused between different compilations. `DenseMap<unsigned, unsigned>` will have the same number of allocations (i.e., just 1). If UIDs really are dense and near each other in one of the submodules then that map will be a little bigger than necessary (~2x), but it should be better for the rest of them.
clang/lib/Lex/Preprocessor.cpp
1330	My suggestion above to drop FileEntryMap in favour of a simple DenseMap would help a bit, just iterating through the files actually included by the submodules. Further, I wonder if "num-includes"/file/submodule (`unsigned`) is actually useful, vs. "was-included"/file/submodule (`bool`). The only observer I see is `HeaderSearch::PrintStats()` but maybe I missed something? If I'm right and we can switch to `bool`, then NumIncludes becomes a `DenseSet<FileEntry *> IncludedFiles` (or `DenseSet<unsigned>` for UIDs). (BTW, I also wondered if you could rotate the map, using File as the outer key, and then use bitsets for the sbumodules, but I doubt this is better, since most files are only included by a few submodules, right?) Then you can just do a set union here. Also simplifies bitcode serialization. (If a `bool`/set is sufficient, then I'd suggest landing that first/separately, before adding the per-submodule granularity in this patch.)
clang/lib/Serialization/ASTWriter.cpp
2496–2497	A vector of maps would be an improvement, but that'll still be a lot of allocations. Since insertion/lookup/deletion aren't intermingled, the simplest way to avoid adding unnecessary overhead is a sorted vector (https://llvm.org/docs/ProgrammersManual.html#dss-sortedvectormap). With no lookups (at all), there's no benefit to a tiered data structure (vs flat). Leading me toward a simple flat vector + sort. struct IncludeToSerialize { // Probably more straightforward than a std::tuple... StringRef Filename; unsigned SMID; unsigned NumIncludes; bool operator<(const IncludeToSerialize &RHS) const { if (SMID != RHS.SMID) return SMID < RHS.SMID; int Diff = Filename.compare(RHS.Filename); assert(Diff && "Expected unique SMID / Filename pairs"); return Diff < 0; } }; SmallVector<IncludeToSerialize> IncludesToSerialize; // ... IncludesToSerialize.push_back({Filename, LocalSMID, NumIncludes}); // ... llvm::sort(IncludesToSerialize); for (const IncludeToSerialize &SI : IncludesToSerialize) { // emit record } (Or if there are duplicates expected to be encountered and ignored, you can remove the assertion and use stable_sort + unique + erase.)
2522	I wonder, will the `Filename` already be serialized elsewhere? Could an ID from that be reused here, rather than writing the filename again? (Maybe that'd need a bigger refactor of some sort to create a filename table?) Stepping back, it looks like this is always eagerly loaded. Could it be lazily-loaded by submodule? Could it be lazily-loaded by filename? In the former case, seems like a single record per submodule makes sense, with a single blob that can be decoded on-demand. In the latter case, maybe it should be rotated, and stored a single record per filename as a blob that can be lazily decoded: <filename-size> <filename> <num-submodules> (<smid> <num-includes>)+

In D112915#3106472, @jansvoboda11 wrote:

That's interesting. I think HeaderFileInfo::isImport should definitely be tracked in the preprocessor, not in HeaderFileInfo. The fact that the header was #imported is not an intrinsic property of the file itself, but rather a preprocessor state. Can you think of other fields that don't really belong to HeaderFileInfo?

After checking HeaderFileInfo, looks like isImport is the only other field that should be tracked in the preprocessor. I had in mind a case where a hidden submodule imports a file with x-macros and then a visible submodule includes this header twice with different macros. First include would go through because NumIncludes == 0, and the second one shouldn't because NumIncludes == 1 && isImport == true. The import in the hidden submodule is incorrect but errors in unused headers shouldn't break actually used headers.

Call getFileInfo in Preprocessor::EnterMainSourceFile. This ensures deserialization of HeaderFileInfo, which seems to be necessary with modules-ts enabled.

Harbormaster completed remote builds in B132483: Diff 384791.Nov 4 2021, 11:23 AM

Fix deserialization, improve (sub)module state tracking

Herald added a subscriber: mgrang. · View Herald TranscriptNov 8 2021, 2:59 AM

Harbormaster completed remote builds in B132974: Diff 385434.Nov 8 2021, 4:06 AM

Make loading of (sub)module includes lazy

jansvoboda11 retitled this revision from WIP: [clang][modules] Granular tracking of includes to [clang][modules] Track number of includes per submodule.Nov 9 2021, 1:31 AM

jansvoboda11 edited the summary of this revision. (Show Details)

Thanks for your feedback @dexonsmith. I reworked the patch to use more sensible data structures as suggested, and made the AST deserialization lazy (on (sub)module import).

I think the only thing to figure out is the exact structure of the serialized information - whether we're fine with serializing all transitively included files in each AST file, or whether we'd like to fetch that information from other AST files instead.

I removed the WIP tag and would be happy to gather more feedback.

clang/include/clang/Lex/Preprocessor.h
722–725	You're right that UIDs of files included in a single submodule are unlikely to have "most" files. Thanks for pointing that out. In the latest revision, I switched to `llvm::DenseMap<const FileEntry *, unsigned>`.
clang/lib/Lex/Preprocessor.cpp
1330	For each file, we need to have three distinct states: not included at all, included exactly once (`firstTimeLexingFile`), included more than once. This means we can't use a single `DenseSet`. But we could use a `DenseMap<Key, bool>`, where "not included at all" can be expressed as being absent from the map, exactly once as having `true` in the map and more than once as having `false` in the map. Alternatively, we could use two `DenseSet` instances to encode the same, but I don't think having two lookups per file to determine stuff is efficient. I can look into this in a follow-up patch.
clang/lib/Serialization/ASTWriter.cpp
2496–2497	In the latest revision, I ended up sorting just based on `Filename`, since this is now explicitly stored per submodule.
2522	We already serialize `Filename` elsewhere, but only for local input files. Here we need transitive closure of all included input files. I'm still unsure whether it's fine to store transitively included files here or if we should look that information up in the respective AST files. The current solution looks like it will bloat sizes of the AST files, but I think the transitive closure is already being stored for `HeaderFileInfo` anyways, so it shouldn't be that big of a deal?
2522	Thinking about it some more, the current implementation will be duplicating a lot of filenames between submodules of the same module. We might need to extract that to some common storage that we can refer to with simple integer offsets...

Harbormaster completed remote builds in B133196: Diff 385730.Nov 9 2021, 2:20 AM

Make clang-format happy.

Harbormaster completed remote builds in B133211: Diff 385751.Nov 9 2021, 3:59 AM

Store only direct includes in the PCM (compared to transitive stored previously), use InputFile ID (instead of full filesystem path).

Also: add test of transitive includes, fix bug in VisibleModuleSet::setVisible.

Harbormaster completed remote builds in B133467: Diff 386139.Nov 10 2021, 7:05 AM

Implementation looks a lot cleaner!

I'd still like to drop NumIncludes first if possible because I think it'll be easier to reason about without this extra layer of complexity. Also, that'd mitigate the potential regression in .pcm size.

(Note: I'd be more comfortable with @vsapsai and/or @rsmith reviewing the overall direction; I just jumped in for the implementation details.)

clang/include/clang/Lex/HeaderSearch.h
133–139 ↗	(On Diff #386139)	Looks like this is already dead code? If so, please separate out and commit ahead of time (e.g., now).
clang/include/clang/Serialization/ModuleFile.h
400	Each StringMapEntry is going to have a pretty big allocation here, for a 512B vector. Given that it doesn't need to be after creation, it'd be more efficient to use this pattern: llvm::StringMap<ArrayRef<uint64_t>> SubmoduleIncludedFiles; SpecificBumpPtrAlloc<uint64_t> SubmoduleIncludedFilesAlloc; // later MutableArrayRef<uint64_t> Files(SubmoduleIncludedFiles.Allocate(Record.size()), Record.size()); llvm::copy(Record, Files.begin()); SubmoduleIncludedFiles[Key] = Files; That said, I feel like this should just be: DenseMap<StringRef, StringRef> SubmoduleIncludedFiles; The key can point at the name of the submodule, which must already exist somewhere without needing a StringMap to create a new copy of it. The value is a lazily deserialized blob.
clang/lib/Lex/Preprocessor.cpp
1330	Seems like a DenseSet could still be used by having HeaderInfo pass back the WasInserted bit from the insertion to the preprocessor, and threading it through to Preprocessor::HandleEndOfFile (the only caller of FirstTimeLexingFile): bool IsFirst = Set.insert(Key).second; The threading doesn't seem too hard. Looking at main: Preprocessor::HandleHeaderIncludeOrImport calls HeaderInfo::ShouldEnterIncludeFile. This does the `++FI.NumIncludes` (going from 0 to 1). Instead, it could be `IsFirst = !FI.WasIncluded; FI.WasIncluded = true;`, then return `IsFirst` somehow. (Then your patch can pull `IsFirst` from the `insert().second`). Preprocessor::HandleHeaderIncludeOrImport calls Preprocessor::EnterSourceFile. This creates a new Lexer for that file. `IsFirst` can be stored on that Lexer. Preprocessor::HandleEndOfFile calls FirstTimeLexingFile. Instead, it can check the new accessor `CurLexer->isFirstTimeLexing()`. I can look into this in a follow-up patch. Follow-up might be okay, but it'd be nice to remove an axis of complexity before adding a new one if it's reasonable. E.g., it'll be easier to debug emergent issues from changing it to a simple set since there's less machinery to worry about.
clang/lib/Serialization/ASTReader.cpp
5736–5738	This looks lazy, but a bunch of work was just done to decode the `Record` from bitcode. To make this actually lazy, you can encode the data in a blob, which doesn't have to be decoded until it's used.
clang/lib/Serialization/ASTWriter.cpp
2257–2261	Why does the count need to be encoded? The only observer is `Preprocessor::HandleEndOfFile`. If it gets called again for this file, it'll be after `++NumIncludes`.

As we've discussed earlier, tracking isImport shouldn't be done per .pcm and here is the test case https://gist.github.com/vsapsai/a2d2bd19c54c24540495fd9b262106aa I'm not sure it is worth adding the second #include as the test fails just with one.

Overall, the change seems more complicated than it has to be. I need to check it carefully to see what can be simplified. And I need to check in debugger how and when AST reading is triggered. Looks like not all of that is lazy but I need to check the compiled code, not my guesses.

vsapsai added a child revision: D114051: Illustrate an alternative for tracking includes per submodule..Nov 16 2021, 8:03 PM

Didn't go in-depth for serialization/deserialization. When we add tracking isImport on per-submodule basis, do you think AST reading/writing would change significantly?

clang/include/clang/Lex/ExternalPreprocessorSource.h
47	I think it is better for understanding and more convenient to use some `using` instead of duplicating `llvm::DenseMap<const FileEntry *, unsigned>` in multiple places.
clang/include/clang/Lex/Preprocessor.h
771–777	I think the interplay between `CurSubmoduleIncludeState`, `IncludedFiles`, and `CurSubmoduleState` is pretty complicated. Recently I've realized that it can be beneficial to distinguish include tracking for the purpose of serializing per submodule and for the purpose of deciding if should enter a file. In D114051 I've tried to illustrate this approach. There are some tests failing but hope the main idea is still understandable. One of my big concerns is tracking `VisibleModules` in multiple places. D114051 demonstrates one of the ways to deal with it but I think it is more important for you to know the problem I was trying to solve, not the solution I came up with.
clang/lib/Basic/Module.cpp
653	Was meaning to make this fix for a long time but couldn't test it. Thanks for finally fixing it!
clang/lib/Lex/Preprocessor.cpp
1329	If I drop checking `getLocalSubmoduleIncludes`, no tests are failing. But it seems like this call is required. How can we test it?

Rebase on top of extracted patches.

Harbormaster completed remote builds in B134748: Diff 387952.Nov 17 2021, 8:24 AM

In D112915#3122340, @dexonsmith wrote:

Implementation looks a lot cleaner!

I'd still like to drop NumIncludes first if possible because I think it'll be easier to reason about without this extra layer of complexity. Also, that'd mitigate the potential regression in .pcm size.

Done in D114096. Thanks for the feedback.

clang/include/clang/Lex/HeaderSearch.h
133–139 ↗	(On Diff #386139)	Done in D114092.
clang/include/clang/Serialization/ModuleFile.h
400	I switched to `StringRef` value in the latest revision. Unfortunately, had to use `std::string` as the key instead of `StringRef`, since `getFullModuleName` constructs the string on heap. That forced me to use `std::map`, too. I'll explore using something else entirely as the key.
clang/lib/Lex/Preprocessor.cpp
1330	Extracted into D114093.

jansvoboda11 edited the summary of this revision. (Show Details)Nov 17 2021, 12:15 PM

jansvoboda11 added a parent revision: D114096: [clang][lex][modules] Stop tracking number of includes.

In D112915#3136492, @vsapsai wrote:

Didn't go in-depth for serialization/deserialization. When we add tracking isImport on per-submodule basis, do you think AST reading/writing would change significantly?

I think moving isImport into Preprocessor can be done in a similar way to how we handle the number of includes (or rather "has been included/imported" behavior), so we should be able to reuse some parts of this patch there.

One question that remains to be answered is whether to keep two separate DenseSet<const FileEntry *> instances, or merge them into something like:

struct IncludeInfo {
  bool IsImportedOrIncluded; // currently Preprocessor::IncludedFiles
  bool IsImported;           // currently HeaderFileInfo::isImport
}
llvm::DenseMap<const FileEntry *, IncludeInfo> IncludedFilesInfo;

Thanks a lot for the feedback, I appreciate it.

clang/include/clang/Lex/ExternalPreprocessorSource.h
47	Yeah, spelling the type out everywhere is a bit unwieldy. It's a bit better with `llvm::DenseSet<const FileEntry *>` in the latest revision, but still not pretty. I wanted to avoid pulling `Preprocessor.`h everywhere, but will probably reconsider doing that.
clang/include/clang/Lex/Preprocessor.h
771–777	Thanks a ton, this must've taken quite a bit of time to put together. I agree your approach is much simpler. I'll investigate how it behaves on larger projects, but probably will end up adopting it in this patch. I have done some minor tweaks locally to fix the failing PCH tests, `Modules/import-textual-noguard.mm` doesn't make much sense to me, I think I'll end up updating that test.
clang/lib/Lex/Preprocessor.cpp
1329	I think this should kick in when importing a submodule from the same module. I'll try to come up with a test case.

jansvoboda11 marked an inline comment as done.Nov 17 2021, 12:28 PM

jansvoboda11 marked an inline comment as done.

dexonsmith added inline comments.Nov 17 2021, 1:05 PM

clang/include/clang/Serialization/ModuleFile.h
400	Oh, if the key isn't being kept alive elsewhere, you can/should use `StringMap<StringRef>`, which is strictly better than `std::map<std::string, StringRef>` (except for non-deterministic iteration order). I suggested otherwise because I assumed the key already existed. Also, if the map isn't being deleted from (much), might as well bump-ptr-allocate it: BumpPtrAllocator SubmoduleIncludedFilesAlloc; StringMap<StringRef, BumpPtrAllocator> SubmoduleIncludedFiles; // iniitalized with teh Alloc above. I'll explore using something else entirely as the key. Not necessarily important; just if the key was already (e.g., stored in its `Module` or something) you might as well use it. Unless, maybe if the `Module` is known to be constructed already, you could use its address?
400	I switched to `StringRef` value in the latest revision. Unfortunately, had to use `std::string` as the key instead of `StringRef`, since `getFullModuleName` constructs the string on heap. That forced me to use `std::map`, too. I'll explore using something else entirely as the key.

jansvoboda11 retitled this revision from [clang][modules] Track number of includes per submodule to [clang][modules] Track included files per submodule.Nov 18 2021, 9:37 AM

jansvoboda11 edited the summary of this revision. (Show Details)

jansvoboda11 added a parent revision: D114095: [clang][lex] Include tracking: simplify and move to preprocessor.

jansvoboda11 removed a parent revision: D114096: [clang][lex][modules] Stop tracking number of includes.

Rebase, add type alias, improve map allocation in ASTReader.

Harbormaster completed remote builds in B134930: Diff 388242.Nov 18 2021, 9:41 AM

Apply part of Volodymyr's cleanup.

Add missed newline.

jansvoboda11 marked 5 inline comments as done.Nov 18 2021, 10:16 AM

jansvoboda11 added a child revision: D114173: [clang][modules] Apply local submodule visibility to includes.Nov 18 2021, 10:27 AM

Harbormaster completed remote builds in B134938: Diff 388253.Nov 18 2021, 10:48 AM

I think AST format for IncludedFiles was discussed here, so I'll continue here though the bulk of implementation is in D114095 now. Have you compared the size of resulting .pcm files when you are using a bitvector compared to a list of included headers? In my quick check (which is not a perfect comparison, to be honest) bitvector approach takes more space. For example, Darwin.pcm is 7320 bytes bigger, UIKit.pcm is 2388 bytes bigger, and entire modules cache is 14KB bigger. I haven't checked the details of the discrepancies, so curious if you have some insights already. For the record, I was testing with

echo '#import <UIKit/UIKit.h>' | path/to/built/bin/clang -fsyntax-only -isysroot "$(xcrun --sdk iphoneos --show-sdk-path)" -target arm64-apple-ios -fmodules -fmodules-cache-path=modules.noindex -x objective-c -

vsapsai mentioned this in D114095: [clang][lex] Include tracking: simplify and move to preprocessor.Nov 18 2021, 3:41 PM

Rebase

Harbormaster completed remote builds in B135056: Diff 388409.Nov 19 2021, 1:29 AM

In D112915#3141417, @vsapsai wrote:
I think AST format for IncludedFiles was discussed here, so I'll continue here though the bulk of implementation is in D114095 now. Have you compared the size of resulting .pcm files when you are using a bitvector compared to a list of included headers? In my quick check (which is not a perfect comparison, to be honest) bitvector approach takes more space. For example, Darwin.pcm is 7320 bytes bigger, UIKit.pcm is 2388 bytes bigger, and entire modules cache is 14KB bigger. I haven't checked the details of the discrepancies, so curious if you have some insights already. For the record, I was testing with
echo '#import <UIKit/UIKit.h>' | path/to/built/bin/clang -fsyntax-only -isysroot "$(xcrun --sdk iphoneos --show-sdk-path)" -target arm64-apple-ios -fmodules -fmodules-cache-path=modules.noindex -x objective-c -

Replied in D114095.

Can you please rebase this change after D114095 lands? Overall looks good but I want to take one final look and triggering the pre-merge checks will be useful.

Rebase, update unnecessary auto, run clang-format

Harbormaster completed remote builds in B145743: Diff 403263.Jan 27 2022, 2:42 AM

vsapsai added inline comments.Jan 27 2022, 5:10 PM

clang/include/clang/Lex/Preprocessor.h
1251–1260	Was curious why `getNullSubmoduleIncludes` isn't `getLocalSubmoduleIncludes(nullptr)`? If we want to have separate methods and implementations, it might be useful to assert `M` isn't null in `getLocalSubmoduleIncludes`.
clang/lib/Serialization/ASTReader.cpp
8631	Can you please check again the returned pointer doesn't end up as a dangling pointer? I don't think we store the pointer anywhere, which is good. My bigger concern is if we can invalidate `SubmoduleIncludedFiles` iterator while working with the returned pointer. I haven't found any indication of that but would like somebody else to check that.

jansvoboda11 mentioned this in D155131: [clang][modules] Deserialize included files lazily.Jul 12 2023, 3:33 PM

jansvoboda11 mentioned this in rG6504d87fc0c8: [clang][modules] Deserialize included files lazily.Jul 13 2023, 3:00 PM

jansvoboda11 mentioned this in D155503: [clang][modules] Track included files per submodule.Jul 17 2023, 11:39 AM

jansvoboda11 removed a child revision: D114173: [clang][modules] Apply local submodule visibility to includes.Jul 17 2023, 1:38 PM

jansvoboda11 added a child revision: D114173: [clang][modules] Apply local submodule visibility to includes.Jul 17 2023, 1:39 PM

jansvoboda11 removed a child revision: D114173: [clang][modules] Apply local submodule visibility to includes.Jul 17 2023, 1:40 PM

Revision Contents

Path

Size

clang/

include/

clang/

Lex/

ExternalPreprocessorSource.h

5 lines

Preprocessor.h

34 lines

Serialization/

3 lines

10 lines

4 lines

9 lines

lib/

Basic/

Module.cpp

2 lines

Lex/

PPLexerChange.cpp

4 lines

Preprocessor.cpp

36 lines

Serialization/

ASTReader.cpp

38 lines

ASTWriter.cpp

29 lines

test/

Modules/

import-submodule-visibility.c

99 lines

Diff 403263

clang/include/clang/Lex/ExternalPreprocessorSource.h

//===- ExternalPreprocessorSource.h - Abstract Macro Interface --- C++ --===//		//===- ExternalPreprocessorSource.h - Abstract Macro Interface --- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the ExternalPreprocessorSource interface, which enables		// This file defines the ExternalPreprocessorSource interface, which enables
// construction of macro definitions from some external source.		// construction of macro definitions from some external source.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#ifndef LLVM_CLANG_LEX_EXTERNALPREPROCESSORSOURCE_H		#ifndef LLVM_CLANG_LEX_EXTERNALPREPROCESSORSOURCE_H
#define LLVM_CLANG_LEX_EXTERNALPREPROCESSORSOURCE_H		#define LLVM_CLANG_LEX_EXTERNALPREPROCESSORSOURCE_H

		#include "clang/Lex/Preprocessor.h"

namespace clang {		namespace clang {

class IdentifierInfo;		class IdentifierInfo;
class Module;		class Module;

/// Abstract interface for external sources of preprocessor		/// Abstract interface for external sources of preprocessor
/// information.		/// information.
///		///
Show All 11 Lines	public:

/// Return the identifier associated with the given ID number.		/// Return the identifier associated with the given ID number.
///		///
/// The ID 0 is associated with the NULL identifier.		/// The ID 0 is associated with the NULL identifier.
virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;		virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;

/// Map a module ID to a module.		/// Map a module ID to a module.
virtual Module *getModule(unsigned ModuleID) = 0;		virtual Module *getModule(unsigned ModuleID) = 0;

		/// Return the set of files directly included in the given (sub)module.
		virtual const Preprocessor::IncludedFilesSet getIncludedFiles(Module M) = 0;
		vsapsaiUnsubmitted Done Reply Inline Actions I think it is better for understanding and more convenient to use some `using` instead of duplicating `llvm::DenseMap<const FileEntry , unsigned>` in multiple places. vsapsai:* I think it is better for understanding and more convenient to use some `using` instead of…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Yeah, spelling the type out everywhere is a bit unwieldy. It's a bit better with `llvm::DenseSet<const FileEntry >` in the latest revision, but still not pretty. I wanted to avoid pulling `Preprocessor.`h everywhere, but will probably reconsider doing that. jansvoboda11:* Yeah, spelling the type out everywhere is a bit unwieldy. It's a bit better with `llvm…
};		};

}		}

#endif		#endif

clang/include/clang/Lex/Preprocessor.h

Show First 20 Lines • Show All 713 Lines • ▼ Show 20 Lines	private:

/// For each IdentifierInfo that was associated with a macro, we		/// For each IdentifierInfo that was associated with a macro, we
/// keep a mapping to the history of all macro definitions and #undefs in		/// keep a mapping to the history of all macro definitions and #undefs in
/// the reverse order (the latest one is in the head of the list).		/// the reverse order (the latest one is in the head of the list).
///		///
/// This mapping lives within the \p CurSubmoduleState.		/// This mapping lives within the \p CurSubmoduleState.
using MacroMap = llvm::DenseMap<const IdentifierInfo *, MacroState>;		using MacroMap = llvm::DenseMap<const IdentifierInfo *, MacroState>;

struct SubmoduleState;		struct SubmoduleState;

/// Information about a submodule that we're currently building.		/// Information about a submodule that we're currently building.
struct BuildingSubmoduleInfo {		struct BuildingSubmoduleInfo {
		dexonsmithUnsubmitted Done Reply Inline Actions I'm not sure about this choice. UIDs are unlikely to be adjacent and in SubmoduleIncludeState, since a given submodule is unlikely to have "most" files. Also, memory usage characteristics could be "weird" if the FileManager is being reused between different compilations. `DenseMap<unsigned, unsigned>` will have the same number of allocations (i.e., just 1). If UIDs really are dense and near each other in one of the submodules then that map will be a little bigger than necessary (~2x), but it should be better for the rest of them. dexonsmith: I'm not sure about this choice. UIDs are unlikely to be adjacent and in SubmoduleIncludeState…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions You're right that UIDs of files included in a single submodule are unlikely to have "most" files. Thanks for pointing that out. In the latest revision, I switched to `llvm::DenseMap<const FileEntry , unsigned>`. jansvoboda11:* You're right that UIDs of files included in a single submodule are unlikely to have "most"…
/// The module that we are building.		/// The module that we are building.
Module *M;		Module *M;

/// The location at which the module was included.		/// The location at which the module was included.
SourceLocation ImportLoc;		SourceLocation ImportLoc;

/// Whether we entered this submodule via a pragma.		/// Whether we entered this submodule via a pragma.
bool IsPragma;		bool IsPragma;
Show All 28 Lines	private:

/// The preprocessor state for preprocessing outside of any submodule.		/// The preprocessor state for preprocessing outside of any submodule.
SubmoduleState NullSubmoduleState;		SubmoduleState NullSubmoduleState;

/// The current submodule state. Will be \p NullSubmoduleState if we're not		/// The current submodule state. Will be \p NullSubmoduleState if we're not
/// in a submodule.		/// in a submodule.
SubmoduleState *CurSubmoduleState;		SubmoduleState *CurSubmoduleState;

/// The files that have been included.		/// The set of files that have been included in each submodule.
		/// Files included outside of any module (e.g. in PCH) have nullptr key.
		llvm::DenseMap<Module *, IncludedFilesSet> IncludedFilesPerSubmodule;

		/// The global set of files that have been included.
		// TODO: Move this into SubmoduleState.
IncludedFilesSet IncludedFiles;		IncludedFilesSet IncludedFiles;

		vsapsaiUnsubmitted Done Reply Inline Actions I think the interplay between `CurSubmoduleIncludeState`, `IncludedFiles`, and `CurSubmoduleState` is pretty complicated. Recently I've realized that it can be beneficial to distinguish include tracking for the purpose of serializing per submodule and for the purpose of deciding if should enter a file. In D114051 I've tried to illustrate this approach. There are some tests failing but hope the main idea is still understandable. One of my big concerns is tracking `VisibleModules` in multiple places. D114051 demonstrates one of the ways to deal with it but I think it is more important for you to know the problem I was trying to solve, not the solution I came up with. vsapsai: I think the interplay between `CurSubmoduleIncludeState`, `IncludedFiles`, and…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Thanks a ton, this must've taken quite a bit of time to put together. I agree your approach is much simpler. I'll investigate how it behaves on larger projects, but probably will end up adopting it in this patch. I have done some minor tweaks locally to fix the failing PCH tests, `Modules/import-textual-noguard.mm` doesn't make much sense to me, I think I'll end up updating that test. jansvoboda11: Thanks a ton, this must've taken quite a bit of time to put together. I agree your approach is…
/// The set of known macros exported from modules.		/// The set of known macros exported from modules.
llvm::FoldingSet<ModuleMacro> ModuleMacros;		llvm::FoldingSet<ModuleMacro> ModuleMacros;

/// The names of potential module macros that we've not yet processed.		/// The names of potential module macros that we've not yet processed.
llvm::SmallVector<const IdentifierInfo *, 32> PendingModuleMacroNames;		llvm::SmallVector<const IdentifierInfo *, 32> PendingModuleMacroNames;

/// The list of module macros, for each identifier, that are not overridden by		/// The list of module macros, for each identifier, that are not overridden by
/// any other module macro.		/// any other module macro.
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	private:
};		};

/// MacroInfos are managed as a chain for easy disposal. This is the head		/// MacroInfos are managed as a chain for easy disposal. This is the head
/// of that list.		/// of that list.
MacroInfoChain *MIChainHead = nullptr;		MacroInfoChain *MIChainHead = nullptr;

void updateOutOfDateIdentifier(IdentifierInfo &II) const;		void updateOutOfDateIdentifier(IdentifierInfo &II) const;

		/// Get the external include information for the given (sub)module.
		const IncludedFilesSet getExternalSubmoduleIncludes(Module M) const;

public:		public:
Preprocessor(std::shared_ptr<PreprocessorOptions> PPOpts,		Preprocessor(std::shared_ptr<PreprocessorOptions> PPOpts,
DiagnosticsEngine &diags, LangOptions &opts, SourceManager &SM,		DiagnosticsEngine &diags, LangOptions &opts, SourceManager &SM,
HeaderSearch &Headers, ModuleLoader &TheModuleLoader,		HeaderSearch &Headers, ModuleLoader &TheModuleLoader,
IdentifierInfoLookup *IILookup = nullptr,		IdentifierInfoLookup *IILookup = nullptr,
bool OwnsHeaderSearch = false,		bool OwnsHeaderSearch = false,
TranslationUnitKind TUKind = TU_Complete);		TranslationUnitKind TUKind = TU_Complete);

▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	macros(bool IncludeExternalMacros = true) const {
macro_iterator end = macro_end(IncludeExternalMacros);		macro_iterator end = macro_end(IncludeExternalMacros);
return llvm::make_range(begin, end);		return llvm::make_range(begin, end);
}		}

/// \}		/// \}

/// Mark the file as included.		/// Mark the file as included.
/// Returns true if this is the first time the file was included.		/// Returns true if this is the first time the file was included.
bool markIncluded(const FileEntry *File) {		bool markIncluded(const FileEntry *File);
HeaderInfo.getFileInfo(File);
return IncludedFiles.insert(File).second;		/// Mark the file as transitively included.
}		void markTransitivelyIncluded(const FileEntry *File);

/// Return true if this header has already been included.		/// Return true if this header has already been included.
bool alreadyIncluded(const FileEntry *File) const {		bool alreadyIncluded(const FileEntry *File) const;
return IncludedFiles.count(File);
		/// Get the set of files included outside of any (sub)module.
		const IncludedFilesSet *getNullSubmoduleIncludes() const {
		auto It = IncludedFilesPerSubmodule.find(nullptr);
		return It == IncludedFilesPerSubmodule.end() ? nullptr : &It->second;
}		}

/// Get the set of included files.		/// Get the set of files included in the given (sub)module.
IncludedFilesSet &getIncludedFiles() { return IncludedFiles; }		const IncludedFilesSet getLocalSubmoduleIncludes(Module M) const {
const IncludedFilesSet &getIncludedFiles() const { return IncludedFiles; }		auto It = IncludedFilesPerSubmodule.find(M);
		return It == IncludedFilesPerSubmodule.end() ? nullptr : &It->second;
		}
		vsapsaiUnsubmitted Not Done Reply Inline Actions Was curious why `getNullSubmoduleIncludes` isn't `getLocalSubmoduleIncludes(nullptr)`? If we want to have separate methods and implementations, it might be useful to assert `M` isn't null in `getLocalSubmoduleIncludes`. vsapsai: Was curious why `getNullSubmoduleIncludes` isn't `getLocalSubmoduleIncludes(nullptr)`? If we…

/// Return the name of the macro defined before \p Loc that has		/// Return the name of the macro defined before \p Loc that has
/// spelling \p Tokens. If there are multiple macros with same spelling,		/// spelling \p Tokens. If there are multiple macros with same spelling,
/// return the last one defined.		/// return the last one defined.
StringRef getLastMacroWithSpelling(SourceLocation Loc,		StringRef getLastMacroWithSpelling(SourceLocation Loc,
ArrayRef<TokenValue> Tokens) const;		ArrayRef<TokenValue> Tokens) const;

const std::string &getPredefines() const { return Predefines; }		const std::string &getPredefines() const { return Predefines; }
▲ Show 20 Lines • Show All 1,281 Lines • Show Last 20 Lines

clang/include/clang/Serialization/ASTBitCodes.h

Show First 20 Lines • Show All 822 Lines • ▼ Show 20 Lines	enum SubmoduleRecordTypes {

/// Specifies some declarations with initializers that must be		/// Specifies some declarations with initializers that must be
/// emitted to initialize the module.		/// emitted to initialize the module.
SUBMODULE_INITIALIZERS = 16,		SUBMODULE_INITIALIZERS = 16,

/// Specifies the name of the module that will eventually		/// Specifies the name of the module that will eventually
/// re-export the entities in this module.		/// re-export the entities in this module.
SUBMODULE_EXPORT_AS = 17,		SUBMODULE_EXPORT_AS = 17,

		/// Specifies files included in this module.
		SUBMODULE_INCLUDED_FILES = 18,
};		};

/// Record types used within a comments block.		/// Record types used within a comments block.
enum CommentRecordTypes { COMMENTS_RAW_COMMENT = 0 };		enum CommentRecordTypes { COMMENTS_RAW_COMMENT = 0 };

/// \defgroup ASTAST AST file AST constants		/// \defgroup ASTAST AST file AST constants
///		///
/// The constants in this group describe various components of the		/// The constants in this group describe various components of the
▲ Show 20 Lines • Show All 1,310 Lines • Show Last 20 Lines

clang/include/clang/Serialization/ASTReader.h

Show First 20 Lines • Show All 919 Lines • ▼ Show 20 Lines	struct ImportedSubmodule {
ImportedSubmodule(serialization::SubmoduleID ID, SourceLocation ImportLoc)		ImportedSubmodule(serialization::SubmoduleID ID, SourceLocation ImportLoc)
: ID(ID), ImportLoc(ImportLoc) {}		: ID(ID), ImportLoc(ImportLoc) {}
};		};

private:		private:
/// A list of modules that were imported by precompiled headers or		/// A list of modules that were imported by precompiled headers or
/// any other non-module AST file.		/// any other non-module AST file.
SmallVector<ImportedSubmodule, 2> ImportedModules;		SmallVector<ImportedSubmodule, 2> ImportedModules;

		/// Mapping between a (sub)module and deserialized set of included files.
		llvm::DenseMap<Module *, Preprocessor::IncludedFilesSet>
		SubmoduleIncludedFiles;
//@}		//@}

/// The system include root to be used when loading the		/// The system include root to be used when loading the
/// precompiled header.		/// precompiled header.
std::string isysroot;		std::string isysroot;

/// Whether to disable the normal validation performed on precompiled		/// Whether to disable the normal validation performed on precompiled
/// headers and module files when they are loaded.		/// headers and module files when they are loaded.
▲ Show 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	private:

llvm::Error ReadASTBlock(ModuleFile &F, unsigned ClientLoadCapabilities);		llvm::Error ReadASTBlock(ModuleFile &F, unsigned ClientLoadCapabilities);
llvm::Error ReadExtensionBlock(ModuleFile &F);		llvm::Error ReadExtensionBlock(ModuleFile &F);
void ReadModuleOffsetMap(ModuleFile &F) const;		void ReadModuleOffsetMap(ModuleFile &F) const;
void ParseLineTable(ModuleFile &F, const RecordData &Record);		void ParseLineTable(ModuleFile &F, const RecordData &Record);
llvm::Error ReadSourceManagerBlock(ModuleFile &F);		llvm::Error ReadSourceManagerBlock(ModuleFile &F);
llvm::BitstreamCursor &SLocCursorForID(int ID);		llvm::BitstreamCursor &SLocCursorForID(int ID);
SourceLocation getImportLocation(ModuleFile *F);		SourceLocation getImportLocation(ModuleFile *F);
void readIncludedFiles(ModuleFile &F, StringRef Blob, Preprocessor &PP);		Preprocessor::IncludedFilesSet readIncludedFiles(ModuleFile &F,
		StringRef Blob);
ASTReadResult ReadModuleMapFileBlock(RecordData &Record, ModuleFile &F,		ASTReadResult ReadModuleMapFileBlock(RecordData &Record, ModuleFile &F,
const ModuleFile *ImportedBy,		const ModuleFile *ImportedBy,
unsigned ClientLoadCapabilities);		unsigned ClientLoadCapabilities);
llvm::Error ReadSubmoduleBlock(ModuleFile &F,		llvm::Error ReadSubmoduleBlock(ModuleFile &F,
unsigned ClientLoadCapabilities);		unsigned ClientLoadCapabilities);
static bool ParseLanguageOptions(const RecordData &Record, bool Complain,		static bool ParseLanguageOptions(const RecordData &Record, bool Complain,
ASTReaderListener &Listener,		ASTReaderListener &Listener,
bool AllowCompatibleDifferences);		bool AllowCompatibleDifferences);
▲ Show 20 Lines • Show All 752 Lines • ▼ Show 20 Lines	public:
/// number.		/// number.
serialization::SubmoduleID		serialization::SubmoduleID
getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID);		getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID);

/// Retrieve the submodule that corresponds to a global submodule ID.		/// Retrieve the submodule that corresponds to a global submodule ID.
///		///
Module *getSubmodule(serialization::SubmoduleID GlobalID);		Module *getSubmodule(serialization::SubmoduleID GlobalID);

		/// Return the set of files directly included in the given (sub)module.
		const Preprocessor::IncludedFilesSet getIncludedFiles(Module M) override;

/// Retrieve the module that corresponds to the given module ID.		/// Retrieve the module that corresponds to the given module ID.
///		///
/// Note: overrides method in ExternalASTSource		/// Note: overrides method in ExternalASTSource
Module *getModule(unsigned ID) override;		Module *getModule(unsigned ID) override;

/// Retrieve the module file with a given local ID within the specified		/// Retrieve the module file with a given local ID within the specified
/// ModuleFile.		/// ModuleFile.
ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID);		ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID);
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

clang/include/clang/Serialization/ASTWriter.h

Show All 13 Lines
#ifndef LLVM_CLANG_SERIALIZATION_ASTWRITER_H		#ifndef LLVM_CLANG_SERIALIZATION_ASTWRITER_H
#define LLVM_CLANG_SERIALIZATION_ASTWRITER_H		#define LLVM_CLANG_SERIALIZATION_ASTWRITER_H

#include "clang/AST/ASTMutationListener.h"		#include "clang/AST/ASTMutationListener.h"
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
#include "clang/AST/Type.h"		#include "clang/AST/Type.h"
#include "clang/Basic/LLVM.h"		#include "clang/Basic/LLVM.h"
#include "clang/Basic/SourceLocation.h"		#include "clang/Basic/SourceLocation.h"
		#include "clang/Lex/Preprocessor.h"
#include "clang/Sema/Sema.h"		#include "clang/Sema/Sema.h"
#include "clang/Sema/SemaConsumer.h"		#include "clang/Sema/SemaConsumer.h"
#include "clang/Serialization/ASTBitCodes.h"		#include "clang/Serialization/ASTBitCodes.h"
#include "clang/Serialization/ASTDeserializationListener.h"		#include "clang/Serialization/ASTDeserializationListener.h"
#include "clang/Serialization/PCHContainerOperations.h"		#include "clang/Serialization/PCHContainerOperations.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
▲ Show 20 Lines • Show All 430 Lines • ▼ Show 20 Lines	private:
/// Calculate hash of the pcm content.		/// Calculate hash of the pcm content.
static std::pair<ASTFileSignature, ASTFileSignature>		static std::pair<ASTFileSignature, ASTFileSignature>
createSignature(StringRef AllBytes, StringRef ASTBlockBytes);		createSignature(StringRef AllBytes, StringRef ASTBlockBytes);

void WriteInputFiles(SourceManager &SourceMgr, HeaderSearchOptions &HSOpts,		void WriteInputFiles(SourceManager &SourceMgr, HeaderSearchOptions &HSOpts,
std::set<const FileEntry *> &AffectingModuleMaps);		std::set<const FileEntry *> &AffectingModuleMaps);
void WriteSourceManagerBlock(SourceManager &SourceMgr,		void WriteSourceManagerBlock(SourceManager &SourceMgr,
const Preprocessor &PP);		const Preprocessor &PP);
void writeIncludedFiles(raw_ostream &Out, const Preprocessor &PP);		void writeIncludedFiles(raw_ostream &Out,
		const Preprocessor::IncludedFilesSet &Files);
void WritePreprocessor(const Preprocessor &PP, bool IsModule);		void WritePreprocessor(const Preprocessor &PP, bool IsModule);
void WriteHeaderSearch(const HeaderSearch &HS);		void WriteHeaderSearch(const HeaderSearch &HS);
void WritePreprocessorDetail(PreprocessingRecord &PPRec,		void WritePreprocessorDetail(PreprocessingRecord &PPRec,
uint64_t MacroOffsetsBase);		uint64_t MacroOffsetsBase);
void WriteSubmodules(Module *WritingModule);		void WriteSubmodules(Module *WritingModule);

void WritePragmaDiagnosticMappings(const DiagnosticsEngine &Diag,		void WritePragmaDiagnosticMappings(const DiagnosticsEngine &Diag,
bool isModule);		bool isModule);
▲ Show 20 Lines • Show All 311 Lines • Show Last 20 Lines

clang/include/clang/Serialization/ModuleFile.h

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
/// Each instance of the Module class corresponds to a single AST file, which		/// Each instance of the Module class corresponds to a single AST file, which
/// may be a precompiled header, precompiled preamble, a module, or an AST file		/// may be a precompiled header, precompiled preamble, a module, or an AST file
/// of some sort loaded as the main file, all of which are specific formulations		/// of some sort loaded as the main file, all of which are specific formulations
/// of the general notion of a "module". A module may depend on any number of		/// of the general notion of a "module". A module may depend on any number of
/// other modules.		/// other modules.
class ModuleFile {		class ModuleFile {
public:		public:
ModuleFile(ModuleKind Kind, unsigned Generation)		ModuleFile(ModuleKind Kind, unsigned Generation)
: Kind(Kind), Generation(Generation) {}		: Kind(Kind), Generation(Generation),
		SubmoduleIncludedFiles(SubmoduleIncludedFilesAlloc) {}
~ModuleFile();		~ModuleFile();

// === General information ===		// === General information ===

/// The index of this module in the list of modules.		/// The index of this module in the list of modules.
unsigned Index = 0;		unsigned Index = 0;

/// The type of this module.		/// The type of this module.
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	public:
unsigned LocalNumSubmodules = 0;		unsigned LocalNumSubmodules = 0;

/// Base submodule ID for submodules local to this module.		/// Base submodule ID for submodules local to this module.
serialization::SubmoduleID BaseSubmoduleID = 0;		serialization::SubmoduleID BaseSubmoduleID = 0;

/// Remapping table for submodule IDs in this module.		/// Remapping table for submodule IDs in this module.
ContinuousRangeMap<uint32_t, int, 2> SubmoduleRemap;		ContinuousRangeMap<uint32_t, int, 2> SubmoduleRemap;

		/// Allocator for the serialized set of included files.
		llvm::BumpPtrAllocator SubmoduleIncludedFilesAlloc;
		/// Mapping between (sub)module names and the serialized set of included
		dexonsmithUnsubmitted Done Reply Inline Actions Each StringMapEntry is going to have a pretty big allocation here, for a 512B vector. Given that it doesn't need to be after creation, it'd be more efficient to use this pattern: llvm::StringMap<ArrayRef<uint64_t>> SubmoduleIncludedFiles; SpecificBumpPtrAlloc<uint64_t> SubmoduleIncludedFilesAlloc; // later MutableArrayRef<uint64_t> Files(SubmoduleIncludedFiles.Allocate(Record.size()), Record.size()); llvm::copy(Record, Files.begin()); SubmoduleIncludedFiles[Key] = Files; That said, I feel like this should just be: DenseMap<StringRef, StringRef> SubmoduleIncludedFiles; The key can point at the name of the submodule, which must already exist somewhere without needing a StringMap to create a new copy of it. The value is a lazily deserialized blob. dexonsmith: Each StringMapEntry is going to have a pretty big allocation here, for a 512B vector. Given…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I switched to `StringRef` value in the latest revision. Unfortunately, had to use `std::string` as the key instead of `StringRef`, since `getFullModuleName` constructs the string on heap. That forced me to use `std::map`, too. I'll explore using something else entirely as the key. jansvoboda11: I switched to `StringRef` value in the latest revision. Unfortunately, had to use `std::string`…
		dexonsmithUnsubmitted Done Reply Inline Actions Oh, if the key isn't being kept alive elsewhere, you can/should use `StringMap<StringRef>`, which is strictly better than `std::map<std::string, StringRef>` (except for non-deterministic iteration order). I suggested otherwise because I assumed the key already existed. Also, if the map isn't being deleted from (much), might as well bump-ptr-allocate it: BumpPtrAllocator SubmoduleIncludedFilesAlloc; StringMap<StringRef, BumpPtrAllocator> SubmoduleIncludedFiles; // iniitalized with teh Alloc above. I'll explore using something else entirely as the key. Not necessarily important; just if the key was already (e.g., stored in its `Module` or something) you might as well use it. Unless, maybe if the `Module` is known to be constructed already, you could use its address? dexonsmith: Oh, if the key isn't being kept alive elsewhere, you can/should use `StringMap<StringRef>`…
		dexonsmithUnsubmitted Done Reply Inline Actions I switched to `StringRef` value in the latest revision. Unfortunately, had to use `std::string` as the key instead of `StringRef`, since `getFullModuleName` constructs the string on heap. That forced me to use `std::map`, too. I'll explore using something else entirely as the key. dexonsmith: > I switched to `StringRef` value in the latest revision. Unfortunately, had to use `std…
		/// files. Initialized by the allocator above.
		llvm::StringMap<StringRef, llvm::BumpPtrAllocator> SubmoduleIncludedFiles;

// === Selectors ===		// === Selectors ===

/// The number of selectors new to this file.		/// The number of selectors new to this file.
///		///
/// This is the number of entries in SelectorOffsets.		/// This is the number of entries in SelectorOffsets.
unsigned LocalNumSelectors = 0;		unsigned LocalNumSelectors = 0;

/// Offsets into the selector lookup table's data array		/// Offsets into the selector lookup table's data array
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

clang/lib/Basic/Module.cpp

Show First 20 Lines • Show All 644 Lines • ▼ Show 20 Lines	std::function<void(Visiting)> VisitModule = [&](Visiting V) {
// Nothing to do for a module that's already visible.		// Nothing to do for a module that's already visible.
unsigned ID = V.M->getVisibilityID();		unsigned ID = V.M->getVisibilityID();
if (ImportLocs.size() <= ID)		if (ImportLocs.size() <= ID)
ImportLocs.resize(ID + 1);		ImportLocs.resize(ID + 1);
else if (ImportLocs[ID].isValid())		else if (ImportLocs[ID].isValid())
return;		return;

ImportLocs[ID] = Loc;		ImportLocs[ID] = Loc;
Vis(M);		Vis(V.M);
		vsapsaiUnsubmitted Done Reply Inline Actions Was meaning to make this fix for a long time but couldn't test it. Thanks for finally fixing it! vsapsai: Was meaning to make this fix for a long time but couldn't test it. Thanks for finally fixing it!

// Make any exported modules visible.		// Make any exported modules visible.
SmallVector<Module *, 16> Exports;		SmallVector<Module *, 16> Exports;
V.M->getExportedModules(Exports);		V.M->getExportedModules(Exports);
for (Module *E : Exports) {		for (Module *E : Exports) {
// Don't import non-importable modules.		// Don't import non-importable modules.
if (!E->isUnimportable())		if (!E->isUnimportable())
VisitModule({E, &V});		VisitModule({E, &V});
Show All 28 Lines

clang/lib/Lex/PPLexerChange.cpp

Show First 20 Lines • Show All 682 Lines • ▼ Show 20 Lines	void Preprocessor::HandleMicrosoftCommentPaste(Token &Tok) {
// didn't find an explicit \n. This can only happen if there was no lexer		// didn't find an explicit \n. This can only happen if there was no lexer
// active (an active lexer would return EOD at EOF if there was no \n in		// active (an active lexer would return EOD at EOF if there was no \n in
// preprocessor directive mode), so just return EOF as our token.		// preprocessor directive mode), so just return EOF as our token.
assert(!FoundLexer && "Lexer should return EOD before EOF in PP mode");		assert(!FoundLexer && "Lexer should return EOD before EOF in PP mode");
}		}

void Preprocessor::EnterSubmodule(Module *M, SourceLocation ImportLoc,		void Preprocessor::EnterSubmodule(Module *M, SourceLocation ImportLoc,
bool ForPragma) {		bool ForPragma) {
		// Ensure that even if this submodule doesn't include anything, it's present
		// in the map.
		IncludedFilesPerSubmodule[M];

if (!getLangOpts().ModulesLocalVisibility) {		if (!getLangOpts().ModulesLocalVisibility) {
// Just track that we entered this submodule.		// Just track that we entered this submodule.
BuildingSubmoduleStack.push_back(		BuildingSubmoduleStack.push_back(
BuildingSubmoduleInfo(M, ImportLoc, ForPragma, CurSubmoduleState,		BuildingSubmoduleInfo(M, ImportLoc, ForPragma, CurSubmoduleState,
PendingModuleMacroNames.size()));		PendingModuleMacroNames.size()));
if (Callbacks)		if (Callbacks)
Callbacks->EnteredSubmodule(M, ImportLoc, ForPragma);		Callbacks->EnteredSubmodule(M, ImportLoc, ForPragma);
return;		return;
}		}

// Resolve as much of the module definition as we can now, before we enter		// Resolve as much of the module definition as we can now, before we enter
// one of its headers.		// one of its headers.
		vsapsaiUnsubmitted Done Reply Inline Actions How many includes are expected to be here? Are this only immediate includes or also transitive? Asking to evaluate how expensive iterating through the includes can get. vsapsai: How many includes are expected to be here? Are this only immediate includes or also transitive?
// FIXME: Can we enable Complain here?		// FIXME: Can we enable Complain here?
// FIXME: Can we do this when local visibility is disabled?		// FIXME: Can we do this when local visibility is disabled?
ModuleMap &ModMap = getHeaderSearchInfo().getModuleMap();		ModuleMap &ModMap = getHeaderSearchInfo().getModuleMap();
ModMap.resolveExports(M, /Complain=/false);		ModMap.resolveExports(M, /Complain=/false);
ModMap.resolveUses(M, /Complain=/false);		ModMap.resolveUses(M, /Complain=/false);
ModMap.resolveConflicts(M, /Complain=/false);		ModMap.resolveConflicts(M, /Complain=/false);

// If this is the first time we've entered this module, set up its state.		// If this is the first time we've entered this module, set up its state.
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

clang/lib/Lex/Preprocessor.cpp

Show First 20 Lines • Show All 1,298 Lines • ▼ Show 20 Lines	if (!Suffix.empty()) {
EnterTokens(Suffix);		EnterTokens(Suffix);
return false;		return false;
}		}
return true;		return true;
}		}

void Preprocessor::makeModuleVisible(Module *M, SourceLocation Loc) {		void Preprocessor::makeModuleVisible(Module *M, SourceLocation Loc) {
CurSubmoduleState->VisibleModules.setVisible(		CurSubmoduleState->VisibleModules.setVisible(
M, Loc, [](Module *) {},		M, Loc,
		[&](Module *M) {
		const Preprocessor::IncludedFilesSet *Includes =
		getLocalSubmoduleIncludes(M);
		if (!Includes)
		Includes = getExternalSubmoduleIncludes(M);
		if (Includes)
		for (const FileEntry E : Includes)
		markTransitivelyIncluded(E);
		},
[&](ArrayRef<Module > Path, Module Conflict, StringRef Message) {		[&](ArrayRef<Module > Path, Module Conflict, StringRef Message) {
// FIXME: Include the path in the diagnostic.		// FIXME: Include the path in the diagnostic.
// FIXME: Include the import location for the conflicting module.		// FIXME: Include the import location for the conflicting module.
Diag(ModuleImportLoc, diag::warn_module_conflict)		Diag(ModuleImportLoc, diag::warn_module_conflict)
<< Path[0]->getFullModuleName()		<< Path[0]->getFullModuleName()
<< Conflict->getFullModuleName()		<< Conflict->getFullModuleName()
<< Message;		<< Message;
});		});

// Add this module to the imports list of the currently-built submodule.		// Add this module to the imports list of the currently-built submodule.
if (!BuildingSubmoduleStack.empty() && M != BuildingSubmoduleStack.back().M)		if (!BuildingSubmoduleStack.empty() && M != BuildingSubmoduleStack.back().M)
BuildingSubmoduleStack.back().M->Imports.insert(M);		BuildingSubmoduleStack.back().M->Imports.insert(M);
}		}
		vsapsaiUnsubmitted Not Done Reply Inline Actions If I drop checking `getLocalSubmoduleIncludes`, no tests are failing. But it seems like this call is required. How can we test it? vsapsai: If I drop checking `getLocalSubmoduleIncludes`, no tests are failing. But it seems like this…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I think this should kick in when importing a submodule from the same module. I'll try to come up with a test case. jansvoboda11: I think this should kick in when importing a submodule from the same module. I'll try to come…

		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Iterating over all FileEntries is probably not very efficient, as Volodymyr mentioned. Thinking about how to make this more efficient... jansvoboda11: Iterating over all FileEntries is probably not very efficient, as Volodymyr mentioned. Thinking…
		dexonsmithUnsubmitted Done Reply Inline Actions My suggestion above to drop FileEntryMap in favour of a simple DenseMap would help a bit, just iterating through the files actually included by the submodules. Further, I wonder if "num-includes"/file/submodule (`unsigned`) is actually useful, vs. "was-included"/file/submodule (`bool`). The only observer I see is `HeaderSearch::PrintStats()` but maybe I missed something? If I'm right and we can switch to `bool`, then NumIncludes becomes a `DenseSet<FileEntry > IncludedFiles` (or `DenseSet<unsigned>` for UIDs). (BTW, I also wondered if you could rotate the map, using File as the outer key, and then use bitsets for the sbumodules, but I doubt this is better, since most files are only included by a few submodules, right?) Then you can just do a set union here. Also simplifies bitcode serialization. (If a `bool`/set is sufficient, then I'd suggest landing that first/separately, before adding the per-submodule granularity in this patch.) dexonsmith:* My suggestion above to drop FileEntryMap in favour of a simple DenseMap would help a bit, just…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions For each file, we need to have three distinct states: not included at all, included exactly once (`firstTimeLexingFile`), included more than once. This means we can't use a single `DenseSet`. But we could use a `DenseMap<Key, bool>`, where "not included at all" can be expressed as being absent from the map, exactly once as having `true` in the map and more than once as having `false` in the map. Alternatively, we could use two `DenseSet` instances to encode the same, but I don't think having two lookups per file to determine stuff is efficient. I can look into this in a follow-up patch. jansvoboda11: For each file, we need to have three distinct states: not included at all, included exactly…
		dexonsmithUnsubmitted Done Reply Inline Actions Seems like a DenseSet could still be used by having HeaderInfo pass back the WasInserted bit from the insertion to the preprocessor, and threading it through to Preprocessor::HandleEndOfFile (the only caller of FirstTimeLexingFile): bool IsFirst = Set.insert(Key).second; The threading doesn't seem too hard. Looking at main: Preprocessor::HandleHeaderIncludeOrImport calls HeaderInfo::ShouldEnterIncludeFile. This does the `++FI.NumIncludes` (going from 0 to 1). Instead, it could be `IsFirst = !FI.WasIncluded; FI.WasIncluded = true;`, then return `IsFirst` somehow. (Then your patch can pull `IsFirst` from the `insert().second`). Preprocessor::HandleHeaderIncludeOrImport calls Preprocessor::EnterSourceFile. This creates a new Lexer for that file. `IsFirst` can be stored on that Lexer. Preprocessor::HandleEndOfFile calls FirstTimeLexingFile. Instead, it can check the new accessor `CurLexer->isFirstTimeLexing()`. I can look into this in a follow-up patch. Follow-up might be okay, but it'd be nice to remove an axis of complexity before adding a new one if it's reasonable. E.g., it'll be easier to debug emergent issues from changing it to a simple set since there's less machinery to worry about. dexonsmith: Seems like a DenseSet could still be used by having HeaderInfo pass back the WasInserted bit…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Extracted into D114093. jansvoboda11: Extracted into D114093.
bool Preprocessor::FinishLexStringLiteral(Token &Result, std::string &String,		bool Preprocessor::FinishLexStringLiteral(Token &Result, std::string &String,
const char *DiagnosticTag,		const char *DiagnosticTag,
bool AllowMacroExpansion) {		bool AllowMacroExpansion) {
// We need at least one string literal.		// We need at least one string literal.
if (Result.isNot(tok::string_literal)) {		if (Result.isNot(tok::string_literal)) {
Diag(Result, diag::err_expected_string_literal)		Diag(Result, diag::err_expected_string_literal)
<< /Source='in...'/0 << DiagnosticTag;		<< /Source='in...'/0 << DiagnosticTag;
return false;		return false;
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines

void Preprocessor::createPreprocessingRecord() {		void Preprocessor::createPreprocessingRecord() {
if (Record)		if (Record)
return;		return;

Record = new PreprocessingRecord(getSourceManager());		Record = new PreprocessingRecord(getSourceManager());
addPPCallbacks(std::unique_ptr<PPCallbacks>(Record));		addPPCallbacks(std::unique_ptr<PPCallbacks>(Record));
}		}

		bool Preprocessor::markIncluded(const FileEntry *File) {
		HeaderInfo.getFileInfo(File);

		Module *CurrentSubmodule = getCurrentModule();
		if (!BuildingSubmoduleStack.empty())
		CurrentSubmodule = BuildingSubmoduleStack.back().M;
		IncludedFilesPerSubmodule[CurrentSubmodule].insert(File);

		return IncludedFiles.insert(File).second;
		}

		void Preprocessor::markTransitivelyIncluded(const FileEntry *File) {
		HeaderInfo.getFileInfo(File);
		IncludedFiles.insert(File);
		}

		bool Preprocessor::alreadyIncluded(const FileEntry *File) const {
		return IncludedFiles.count(File);
		}

		const Preprocessor::IncludedFilesSet *
		Preprocessor::getExternalSubmoduleIncludes(Module *M) const {
		return ExternalSource ? ExternalSource->getIncludedFiles(M) : nullptr;
		}

clang/lib/Serialization/ASTReader.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,952 Lines • ▼ Show 20 Lines	case INPUT_FILE_OFFSETS:
(const llvm::support::unaligned_uint64_t *)Blob.data();		(const llvm::support::unaligned_uint64_t *)Blob.data();
F.InputFilesLoaded.resize(NumInputs);		F.InputFilesLoaded.resize(NumInputs);
F.NumUserInputFiles = NumUserInputs;		F.NumUserInputFiles = NumUserInputs;
break;		break;
}		}
}		}
}		}

void ASTReader::readIncludedFiles(ModuleFile &F, StringRef Blob,		Preprocessor::IncludedFilesSet ASTReader::readIncludedFiles(ModuleFile &F,
Preprocessor &PP) {		StringRef Blob) {
using namespace llvm::support;		using namespace llvm::support;

		Preprocessor::IncludedFilesSet Result;

const unsigned char D = (const unsigned char )Blob.data();		const unsigned char D = (const unsigned char )Blob.data();
unsigned FileCount = endian::readNext<uint32_t, little, unaligned>(D);		unsigned FileCount = endian::readNext<uint32_t, little, unaligned>(D);

for (unsigned I = 0; I < FileCount; ++I) {		for (unsigned I = 0; I < FileCount; ++I) {
size_t ID = endian::readNext<uint32_t, little, unaligned>(D);		size_t ID = endian::readNext<uint32_t, little, unaligned>(D);
InputFileInfo IFI = readInputFileInfo(F, ID);		InputFileInfo IFI = readInputFileInfo(F, ID);
if (llvm::ErrorOr<const FileEntry *> File =		if (llvm::ErrorOr<const FileEntry *> File =
PP.getFileManager().getFile(IFI.Filename))		PP.getFileManager().getFile(IFI.Filename))
PP.getIncludedFiles().insert(*File);		Result.insert(*File);
}		}

		return Result;
}		}

llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,		llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
unsigned ClientLoadCapabilities) {		unsigned ClientLoadCapabilities) {
BitstreamCursor &Stream = F.Stream;		BitstreamCursor &Stream = F.Stream;

if (llvm::Error Err = Stream.EnterSubBlock(AST_BLOCK_ID))		if (llvm::Error Err = Stream.EnterSubBlock(AST_BLOCK_ID))
return Err;		return Err;
▲ Show 20 Lines • Show All 725 Lines • ▼ Show 20 Lines	case MACRO_OFFSET: {
F.BaseMacroID - LocalBaseMacroID));		F.BaseMacroID - LocalBaseMacroID));

MacrosLoaded.resize(MacrosLoaded.size() + F.LocalNumMacros);		MacrosLoaded.resize(MacrosLoaded.size() + F.LocalNumMacros);
}		}
break;		break;
}		}

case PP_INCLUDED_FILES:		case PP_INCLUDED_FILES:
readIncludedFiles(F, Blob, PP);		if (F.Kind == MK_PCH \|\| F.Kind == MK_Preamble \|\| F.Kind == MK_MainFile)
		for (const FileEntry *File : readIncludedFiles(F, Blob))
		PP.markTransitivelyIncluded(File);
break;		break;

case LATE_PARSED_TEMPLATE:		case LATE_PARSED_TEMPLATE:
LateParsedTemplates.emplace_back(		LateParsedTemplates.emplace_back(
std::piecewise_construct, std::forward_as_tuple(&F),		std::piecewise_construct, std::forward_as_tuple(&F),
std::forward_as_tuple(Record.begin(), Record.end()));		std::forward_as_tuple(Record.begin(), Record.end()));
break;		break;

▲ Show 20 Lines • Show All 1,996 Lines • ▼ Show 20 Lines	case SUBMODULE_INITIALIZERS: {
ContextObj->addLazyModuleInitializers(CurrentModule, Inits);		ContextObj->addLazyModuleInitializers(CurrentModule, Inits);
break;		break;
}		}

case SUBMODULE_EXPORT_AS:		case SUBMODULE_EXPORT_AS:
CurrentModule->ExportAsModule = Blob.str();		CurrentModule->ExportAsModule = Blob.str();
ModMap.addLinkAsDependency(CurrentModule);		ModMap.addLinkAsDependency(CurrentModule);
break;		break;

		case SUBMODULE_INCLUDED_FILES:
		F.SubmoduleIncludedFiles.insert(
		{CurrentModule->getFullModuleName(), Blob});
		dexonsmithUnsubmitted Done Reply Inline Actions This looks lazy, but a bunch of work was just done to decode the `Record` from bitcode. To make this actually lazy, you can encode the data in a blob, which doesn't have to be decoded until it's used. dexonsmith: This looks lazy, but a bunch of work was just done to decode the `Record` from bitcode. To make…
}		}
}		}
}		}

/// Parse the record that corresponds to a LangOptions data		/// Parse the record that corresponds to a LangOptions data
/// structure.		/// structure.
///		///
/// This routine parses the language options from the AST file and then gives		/// This routine parses the language options from the AST file and then gives
▲ Show 20 Lines • Show All 2,876 Lines • ▼ Show 20 Lines	ASTReader::getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID) {
ContinuousRangeMap<uint32_t, int, 2>::iterator I		ContinuousRangeMap<uint32_t, int, 2>::iterator I
= M.SubmoduleRemap.find(LocalID - NUM_PREDEF_SUBMODULE_IDS);		= M.SubmoduleRemap.find(LocalID - NUM_PREDEF_SUBMODULE_IDS);
assert(I != M.SubmoduleRemap.end()		assert(I != M.SubmoduleRemap.end()
&& "Invalid index into submodule index remap");		&& "Invalid index into submodule index remap");

return LocalID + I->second;		return LocalID + I->second;
}		}

		const Preprocessor::IncludedFilesSet ASTReader::getIncludedFiles(Module M) {
		vsapsaiUnsubmitted Not Done Reply Inline Actions Can you please check again the returned pointer doesn't end up as a dangling pointer? I don't think we store the pointer anywhere, which is good. My bigger concern is if we can invalidate `SubmoduleIncludedFiles` iterator while working with the returned pointer. I haven't found any indication of that but would like somebody else to check that. vsapsai: Can you please check again the returned pointer doesn't end up as a dangling pointer? I don't…
		ModuleFile *F = getModuleManager().lookup(M->getASTFile());
		if (!F)
		return nullptr;

		auto ResultIt =
		SubmoduleIncludedFiles.insert({M, Preprocessor::IncludedFilesSet{}});
		Preprocessor::IncludedFilesSet &Result = ResultIt.first->second;
		if (!ResultIt.second)
		return &Result;

		auto It = F->SubmoduleIncludedFiles.find(M->getFullModuleName());
		if (It == F->SubmoduleIncludedFiles.end())
		return nullptr;
		StringRef Record = It->second;

		Result = readIncludedFiles(*F, Record);
		return &Result;
		}

Module *ASTReader::getSubmodule(SubmoduleID GlobalID) {		Module *ASTReader::getSubmodule(SubmoduleID GlobalID) {
if (GlobalID < NUM_PREDEF_SUBMODULE_IDS) {		if (GlobalID < NUM_PREDEF_SUBMODULE_IDS) {
assert(GlobalID == 0 && "Unhandled global submodule ID");		assert(GlobalID == 0 && "Unhandled global submodule ID");
return nullptr;		return nullptr;
}		}

if (GlobalID > SubmodulesLoaded.size()) {		if (GlobalID > SubmodulesLoaded.size()) {
Error("submodule ID out of range in AST file");		Error("submodule ID out of range in AST file");
▲ Show 20 Lines • Show All 4,408 Lines • Show Last 20 Lines

clang/lib/Serialization/ASTWriter.cpp

Show First 20 Lines • Show All 2,248 Lines • ▼ Show 20 Lines	if (Loc.isInvalid())
return true;		return true;
if (PP.getSourceManager().getFileID(Loc) == PP.getPredefinesFileID())		if (PP.getSourceManager().getFileID(Loc) == PP.getPredefinesFileID())
return true;		return true;
}		}

return false;		return false;
}		}

void ASTWriter::writeIncludedFiles(raw_ostream &Out, const Preprocessor &PP) {		void ASTWriter::writeIncludedFiles(
		raw_ostream &Out, const Preprocessor::IncludedFilesSet &Files) {
using namespace llvm::support;		using namespace llvm::support;

const Preprocessor::IncludedFilesSet &IncludedFiles = PP.getIncludedFiles();

std::vector<uint32_t> IncludedInputFileIDs;		std::vector<uint32_t> IncludedInputFileIDs;
		dexonsmithUnsubmitted Done Reply Inline Actions Why does the count need to be encoded? The only observer is `Preprocessor::HandleEndOfFile`. If it gets called again for this file, it'll be after `++NumIncludes`. dexonsmith: Why does the count need to be encoded? The only observer is `Preprocessor::HandleEndOfFile`.
IncludedInputFileIDs.reserve(IncludedFiles.size());		IncludedInputFileIDs.reserve(Files.size());

for (const FileEntry *File : IncludedFiles) {		for (const FileEntry *File : Files) {
auto InputFileIt = InputFileIDs.find(File);		auto InputFileIt = InputFileIDs.find(File);
if (InputFileIt == InputFileIDs.end())		if (InputFileIt == InputFileIDs.end())
continue;		continue;
IncludedInputFileIDs.push_back(InputFileIt->second);		IncludedInputFileIDs.push_back(InputFileIt->second);
}		}

llvm::sort(IncludedInputFileIDs);		llvm::sort(IncludedInputFileIDs);

▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	void ASTWriter::WritePreprocessor(const Preprocessor &PP, bool IsModule) {
unsigned MacroOffsetAbbrev = Stream.EmitAbbrev(std::move(Abbrev));		unsigned MacroOffsetAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
{		{
RecordData::value_type Record[] = {MACRO_OFFSET, MacroOffsets.size(),		RecordData::value_type Record[] = {MACRO_OFFSET, MacroOffsets.size(),
FirstMacroID - NUM_PREDEF_MACRO_IDS,		FirstMacroID - NUM_PREDEF_MACRO_IDS,
MacroOffsetsBase - ASTBlockStartOffset};		MacroOffsetsBase - ASTBlockStartOffset};
Stream.EmitRecordWithBlob(MacroOffsetAbbrev, Record, bytes(MacroOffsets));		Stream.EmitRecordWithBlob(MacroOffsetAbbrev, Record, bytes(MacroOffsets));
}		}

{		if (const Preprocessor::IncludedFilesSet *Includes =
		PP.getNullSubmoduleIncludes()) {
auto Abbrev = std::make_shared<BitCodeAbbrev>();		auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(PP_INCLUDED_FILES));		Abbrev->Add(BitCodeAbbrevOp(PP_INCLUDED_FILES));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));
unsigned IncludedFilesAbbrev = Stream.EmitAbbrev(std::move(Abbrev));		unsigned IncludedFilesAbbrev = Stream.EmitAbbrev(std::move(Abbrev));

SmallString<2048> Buffer;		SmallString<2048> Buffer;
raw_svector_ostream Out(Buffer);		raw_svector_ostream Out(Buffer);
writeIncludedFiles(Out, PP);		writeIncludedFiles(Out, *Includes);
		dexonsmithUnsubmitted Done Reply Inline Actions A vector of maps would be an improvement, but that'll still be a lot of allocations. Since insertion/lookup/deletion aren't intermingled, the simplest way to avoid adding unnecessary overhead is a sorted vector (https://llvm.org/docs/ProgrammersManual.html#dss-sortedvectormap). With no lookups (at all), there's no benefit to a tiered data structure (vs flat). Leading me toward a simple flat vector + sort. struct IncludeToSerialize { // Probably more straightforward than a std::tuple... StringRef Filename; unsigned SMID; unsigned NumIncludes; bool operator<(const IncludeToSerialize &RHS) const { if (SMID != RHS.SMID) return SMID < RHS.SMID; int Diff = Filename.compare(RHS.Filename); assert(Diff && "Expected unique SMID / Filename pairs"); return Diff < 0; } }; SmallVector<IncludeToSerialize> IncludesToSerialize; // ... IncludesToSerialize.push_back({Filename, LocalSMID, NumIncludes}); // ... llvm::sort(IncludesToSerialize); for (const IncludeToSerialize &SI : IncludesToSerialize) { // emit record } (Or if there are duplicates expected to be encountered and ignored, you can remove the assertion and use stable_sort + unique + erase.) dexonsmith: A vector of maps would be an improvement, but that'll still be a lot of allocations. - Since…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions In the latest revision, I ended up sorting just based on `Filename`, since this is now explicitly stored per submodule. jansvoboda11: In the latest revision, I ended up sorting just based on `Filename`, since this is now…
RecordData::value_type Record[] = {PP_INCLUDED_FILES};		RecordData::value_type Record[] = {PP_INCLUDED_FILES};
Stream.EmitRecordWithBlob(IncludedFilesAbbrev, Record, Buffer.data(),		Stream.EmitRecordWithBlob(IncludedFilesAbbrev, Record, Buffer.data(),
Buffer.size());		Buffer.size());
}		}
}		}

void ASTWriter::WritePreprocessorDetail(PreprocessingRecord &PPRec,		void ASTWriter::WritePreprocessorDetail(PreprocessingRecord &PPRec,
uint64_t MacroOffsetsBase) {		uint64_t MacroOffsetsBase) {
if (PPRec.local_begin() == PPRec.local_end())		if (PPRec.local_begin() == PPRec.local_end())
return;		return;

SmallVector<PPEntityOffset, 64> PreprocessedEntityOffsets;		SmallVector<PPEntityOffset, 64> PreprocessedEntityOffsets;

// Enter the preprocessor block.		// Enter the preprocessor block.
Stream.EnterSubblock(PREPROCESSOR_DETAIL_BLOCK_ID, 3);		Stream.EnterSubblock(PREPROCESSOR_DETAIL_BLOCK_ID, 3);

// If the preprocessor has a preprocessing record, emit it.		// If the preprocessor has a preprocessing record, emit it.
unsigned NumPreprocessingRecords = 0;		unsigned NumPreprocessingRecords = 0;
using namespace llvm;		using namespace llvm;

// Set up the abbreviation for		// Set up the abbreviation for
unsigned InclusionAbbrev = 0;		unsigned InclusionAbbrev = 0;
{		{
auto Abbrev = std::make_shared<BitCodeAbbrev>();		auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(PPD_INCLUSION_DIRECTIVE));		Abbrev->Add(BitCodeAbbrevOp(PPD_INCLUSION_DIRECTIVE));
		dexonsmithUnsubmitted Done Reply Inline Actions I wonder, will the `Filename` already be serialized elsewhere? Could an ID from that be reused here, rather than writing the filename again? (Maybe that'd need a bigger refactor of some sort to create a filename table?) Stepping back, it looks like this is always eagerly loaded. Could it be lazily-loaded by submodule? Could it be lazily-loaded by filename? In the former case, seems like a single record per submodule makes sense, with a single blob that can be decoded on-demand. In the latter case, maybe it should be rotated, and stored a single record per filename as a blob that can be lazily decoded: <filename-size> <filename> <num-submodules> (<smid> <num-includes>)+ dexonsmith: I wonder, will the `Filename` already be serialized elsewhere? Could an ID from that be reused…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions We already serialize `Filename` elsewhere, but only for local input files. Here we need transitive closure of all included input files. I'm still unsure whether it's fine to store transitively included files here or if we should look that information up in the respective AST files. The current solution looks like it will bloat sizes of the AST files, but I think the transitive closure is already being stored for `HeaderFileInfo` anyways, so it shouldn't be that big of a deal? jansvoboda11: We already serialize `Filename` elsewhere, but only for local input files. Here we need…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Thinking about it some more, the current implementation will be duplicating a lot of filenames between submodules of the same module. We might need to extract that to some common storage that we can refer to with simple integer offsets... jansvoboda11: Thinking about it some more, the current implementation will be duplicating a lot of filenames…
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // filename length		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // filename length
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // in quotes		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // in quotes
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 2)); // kind		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 2)); // kind
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // imported module		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // imported module
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));
InclusionAbbrev = Stream.EmitAbbrev(std::move(Abbrev));		InclusionAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
}		}

▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	void ASTWriter::WriteSubmodules(Module *WritingModule) {
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Message		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Message
unsigned ConflictAbbrev = Stream.EmitAbbrev(std::move(Abbrev));		unsigned ConflictAbbrev = Stream.EmitAbbrev(std::move(Abbrev));

Abbrev = std::make_shared<BitCodeAbbrev>();		Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(SUBMODULE_EXPORT_AS));		Abbrev->Add(BitCodeAbbrevOp(SUBMODULE_EXPORT_AS));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Macro name		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Macro name
unsigned ExportAsAbbrev = Stream.EmitAbbrev(std::move(Abbrev));		unsigned ExportAsAbbrev = Stream.EmitAbbrev(std::move(Abbrev));

		Abbrev = std::make_shared<BitCodeAbbrev>();
		Abbrev->Add(BitCodeAbbrevOp(SUBMODULE_INCLUDED_FILES));
		Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));
		unsigned IncludedFilesAbbrev = Stream.EmitAbbrev(std::move(Abbrev));

// Write the submodule metadata block.		// Write the submodule metadata block.
RecordData::value_type Record[] = {		RecordData::value_type Record[] = {
getNumberOfModules(WritingModule),		getNumberOfModules(WritingModule),
FirstSubmoduleID - NUM_PREDEF_SUBMODULE_IDS};		FirstSubmoduleID - NUM_PREDEF_SUBMODULE_IDS};
Stream.EmitRecord(SUBMODULE_METADATA, Record);		Stream.EmitRecord(SUBMODULE_METADATA, Record);

// Write all of the submodules.		// Write all of the submodules.
std::queue<Module *> Q;		std::queue<Module *> Q;
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	if (!Inits.empty())
Stream.EmitRecord(SUBMODULE_INITIALIZERS, Inits);		Stream.EmitRecord(SUBMODULE_INITIALIZERS, Inits);

// Emit the name of the re-exported module, if any.		// Emit the name of the re-exported module, if any.
if (!Mod->ExportAsModule.empty()) {		if (!Mod->ExportAsModule.empty()) {
RecordData::value_type Record[] = {SUBMODULE_EXPORT_AS};		RecordData::value_type Record[] = {SUBMODULE_EXPORT_AS};
Stream.EmitRecordWithBlob(ExportAsAbbrev, Record, Mod->ExportAsModule);		Stream.EmitRecordWithBlob(ExportAsAbbrev, Record, Mod->ExportAsModule);
}		}

		if (const Preprocessor::IncludedFilesSet *Includes =
		PP->getLocalSubmoduleIncludes(Mod)) {
		SmallString<2048> Buffer;
		raw_svector_ostream Out(Buffer);
		writeIncludedFiles(Out, *Includes);
		RecordData::value_type Record[] = {SUBMODULE_INCLUDED_FILES};
		Stream.EmitRecordWithBlob(IncludedFilesAbbrev, Record, Buffer.data(),
		Buffer.size());
		}

// Queue up the submodules of this module.		// Queue up the submodules of this module.
for (auto *M : Mod->submodules())		for (auto *M : Mod->submodules())
Q.push(M);		Q.push(M);
}		}

Stream.ExitBlock();		Stream.ExitBlock();

assert((NextSubmoduleID - FirstSubmoduleID ==		assert((NextSubmoduleID - FirstSubmoduleID ==
▲ Show 20 Lines • Show All 3,991 Lines • Show Last 20 Lines

clang/test/Modules/import-submodule-visibility.c

This file was added.

				// This test checks that imports of headers that appeared in a different submodule than
				// what is imported by the current TU don't affect the compilation.

				// RUN: rm -rf %t
				// RUN: split-file %s %t

				//--- A.framework/Headers/A.h
				#include "Textual.h"
				//--- A.framework/Modules/module.modulemap
				framework module A { header "A.h" }

				//--- B.framework/Headers/B1.h
				#include "Textual.h"
				//--- B.framework/Headers/B2.h
				//--- B.framework/Modules/module.modulemap
				framework module B {
				module B1 { header "B1.h" }
				module B2 { header "B2.h" }
				}

				//--- C/C.h
				#include "Textual.h"
				//--- C/module.modulemap
				module C { header "C.h" }

				//--- D/D1.h
				#include "Textual.h"
				//--- D/D2.h
				//--- D/module.modulemap
				module D {
				module D1 { header "D1.h" }
				module D2 { header "D2.h" }
				}

				//--- E/E1.h
				#include "E2.h"
				//--- E/E2.h
				#include "Textual.h"
				//--- E/module.modulemap
				module E {
				module E1 { header "E1.h" }
				module E2 { header "E2.h" }
				}

				//--- Textual.h
				#define MACRO_TEXTUAL 1

				//--- test.c

				#ifdef A
				//
				#endif

				#ifdef B
				#import <B/B2.h>
				#endif

				#ifdef C
				//
				#endif

				#ifdef D
				#import "D/D2.h"
				#endif

				#ifdef E
				#import "E/E1.h"
				#endif

				#import "Textual.h"

				static int x = MACRO_TEXTUAL;

				// Specifying the PCM file on the command line (without actually importing "A") should not
				// prevent "Textual.h" to be included in the TU.
				//
				// RUN: %clang_cc1 -fmodules -I %t -emit-module %t/A.framework/Modules/module.modulemap -fmodule-name=A -o %t/A.pcm
				// RUN: %clang_cc1 -fmodules -I %t -fsyntax-only %t/test.c -DA -fmodule-file=%t/A.pcm

				// Specifying the PCM file on the command line and importing "B2" in the source does not
				// prevent "Textual.h" to be included in the TU.
				//
				// RUN: %clang_cc1 -fmodules -I %t -emit-module %t/B.framework/Modules/module.modulemap -fmodule-name=B -o %t/B.pcm
				// RUN: %clang_cc1 -fmodules -I %t -fsyntax-only %t/test.c -DB -iframework %t -fmodule-file=%t/B.pcm

				// Module-only version of the test with framework A.
				//
				// RUN: %clang_cc1 -fmodules -I %t -emit-module %t/C/module.modulemap -fmodule-name=C -o %t/C.pcm
				// RUN: %clang_cc1 -fmodules -I %t -fsyntax-only %t/test.c -DC -fmodule-file=%t/C.pcm

				// Module-only version of the test with framework B.
				//
				// RUN: %clang_cc1 -fmodules -I %t -emit-module %t/D/module.modulemap -fmodule-name=D -o %t/D.pcm
				// RUN: %clang_cc1 -fmodules -I %t -fsyntax-only %t/test.c -DD -fmodule-file=%t/D.pcm

				// Transitively imported, but not exported.
				//
				// RUN: %clang_cc1 -fmodules -I %t -emit-module %t/E/module.modulemap -fmodule-name=E -o %t/E.pcm
				// RUN: %clang_cc1 -fmodules -I %t -fsyntax-only %t/test.c -DE -fmodule-file=%t/E.pcm

This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules] Track included files per submoduleNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 403263

clang/include/clang/Lex/ExternalPreprocessorSource.h

clang/include/clang/Lex/Preprocessor.h

clang/include/clang/Serialization/ASTBitCodes.h

clang/include/clang/Serialization/ASTReader.h

clang/include/clang/Serialization/ASTWriter.h

clang/include/clang/Serialization/ModuleFile.h

clang/lib/Basic/Module.cpp

clang/lib/Lex/PPLexerChange.cpp

clang/lib/Lex/Preprocessor.cpp

clang/lib/Serialization/ASTReader.cpp

clang/lib/Serialization/ASTWriter.cpp

clang/test/Modules/import-submodule-visibility.c

[clang][modules] Track included files per submodule
Needs ReviewPublic