Follow-up on D105426.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I realize many of the things I'll object to came from my own prototype, sorry about that :-\
I think/hope I gave you some forewarning about this!
clang-tools-extra/clangd/Headers.h | ||
---|---|---|
137 | Hey, I recognize this code :-) I think the key ideas here are that:
I think the second idea is important, but the first one might be a bit naive. I worry it's going to lead to certain rules being hard to implement, or being bundled into Headers.cpp instead of IncludeCleaner. For example:
I think the facts (is a file self-contained) should be part of IncludeStructure, but that we should expose them for IncludeCleaner to deal with, rather than trying to hide them in markUsed. Concretely, I think I'd suggest just extending the public API to expose the "file index" concept:
And implement all of markUsed in IncludeCleaner. | |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
144 | nit: move this to be a line comment on the sort() call? I think it's sufficiently nonobvious that sorting groups by file ID that it becomes nonobvious exactly what code the comment refers to! | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
50 | so when *do* we perform this expansion? Seems like you've wired this up end-to-end in this patch and we're just going to hit the elog case. I think it's reasonable to put this expansion into findReferencedFiles fwiw, it's a fairly simple second pass and in practice combining them won't interfere with reasonable tests | |
51 | comment just says spelling, not spelling/expansion, not sure if this is significant. Expansion is actually the more obvious, but I do think we need both. |
Hey, sorry for the gigantic turn around. I still need to cover the code with few tests and polish it a bit more but I've updated the majority of it and pushed to get some early feedback before I do that. Please let me know if you have any concerns/see some problems with the approach I went for!
clang-tools-extra/clangd/Headers.h | ||
---|---|---|
61 | Most includes are part of the preamble, so there are two relevant parse actions (preamble, and mainfile-using-preamble). Each has its own SourceManager and therefore namespace of FileIDs. There's no rule that says a header gets the same FileID when a preamble is used. As written, RecordHeaders is assigning Inclusion::ID based on the preamble, and then we end up comparing it to FileIDs from compareUnusedIncludes(). From reading the ASTReader code, I *believe* that there's a simple offset between the two: e.g. that if a preamble uses FileIDs from 1-100, then these might be mapped to FileIDs 1501-1600 when that preamble is reused. We could go down the path of exploiting this. (Though we need to investigate the details and think a little about how it works with modules). The somewhat less-coupled alternative we use today is to use the FileEntry::Name as documented in the private section of IncludeStructure. There are a few ways to build on top of this - basically we're either going to do most calculations in FileID space, or expose a "stable file index" from IncludeStructure and do most calculations in that space... |
Sorry, I thought i'd sent these comments...
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
158 | Why are we passing around Inclusions by value? | |
clang-tools-extra/clangd/IncludeCleaner.h | ||
50 | This says FIXME but IIUC it's fixed. | |
55 | this function is undocumented, unused and untested :-) What's it for? Why does it not return a set? |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
158 | Sorry, should have thought this through more before leaving the comment. There are a couple of questions really: How should we *store* the information about which inclusions are unused?
IMO this is mostly a question of what's the lifecycle of the info, and what's the simplest representation - seems like we should prefer stuff higher on the list if we have a choice. What should the signature of the function be? | |
159 | EntryPoint is unused | |
166 | Handle unresolved case somehow? (Or assert if you're sure it can't happen - I think it can for e.g. pp_file_not_found) | |
166 | doesn't seem like there's any need to go through filenames for this. | |
168 | elog says that: Can this actually happen? My suspicion is no. In which case maybe it should be an assert? | |
173 | I'm fairly (more) certain this one should be an assert | |
193 | assert? | |
196 | this is get, not getOrCreate, so you don't need the mutable reference | |
198 | I'm not totally sure whether this is safe to assert or not. WDYT? In any case, please fix the message (FE -> HeaderID, add more context) |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
---|---|---|
158 | OK I was confused, nothing is getting stored, but ParsedAST::getUnused() function creates/destroys the analysis data so it needs to run as a batch. I think probably:
We have some circularity between ParsedAST and IncludeCleaner, but I think we're going to have that in any case due to findReferencedLocations() |
clang-tools-extra/clangd/Headers.h | ||
---|---|---|
61 | I don't think we're under any size pressure here - Optional<unsigned>? | |
61 | call the member HeaderID, rather than have this as a comment only? | |
clang-tools-extra/clangd/IncludeCleaner.cpp | ||
161 | SM is unused | |
165 | this can just be a check whether MFI.ID is valid or not | |
171 | ReferencedFiles.contains(IncludeID) and inline into the if? | |
clang-tools-extra/clangd/ParsedAST.h | ||
159 | I think this function belongs in IncludeCleaner.h. (There's a circularity problem between ParsedAST and IncludeCleaner, but putting the function here doesn't fix it) | |
clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp | ||
165 ↗ | (On Diff #377197) | Don't bother doing this stripping just for the test IMO, it obscures the assertion (more than escaping quotes would) |
Most includes are part of the preamble, so there are two relevant parse actions (preamble, and mainfile-using-preamble). Each has its own SourceManager and therefore namespace of FileIDs.
There's no rule that says a header gets the same FileID when a preamble is used. As written, RecordHeaders is assigning Inclusion::ID based on the preamble, and then we end up comparing it to FileIDs from compareUnusedIncludes().
From reading the ASTReader code, I *believe* that there's a simple offset between the two: e.g. that if a preamble uses FileIDs from 1-100, then these might be mapped to FileIDs 1501-1600 when that preamble is reused. We could go down the path of exploiting this. (Though we need to investigate the details and think a little about how it works with modules).
The somewhat less-coupled alternative we use today is to use the FileEntry::Name as documented in the private section of IncludeStructure. There are a few ways to build on top of this - basically we're either going to do most calculations in FileID space, or expose a "stable file index" from IncludeStructure and do most calculations in that space...