This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Record information about non self-contained headers in IncludeStructure
ClosedPublic

Authored by kbobyrev on Nov 22 2021, 7:54 AM.

Diff Detail

Event Timeline

kbobyrev created this revision.Nov 22 2021, 7:54 AM
kbobyrev requested review of this revision.Nov 22 2021, 7:54 AM

thanks, LG in general, just a couple polishing touches

clang-tools-extra/clangd/IncludeCleaner.cpp
147

this returns true even after inserting mainfile id, is that intended? I suppose it won't hurt too much, but means we'll go one step further for macros.

156

can you lift the isSelfContainedHeader into SourceCode.h and make use of it in both places (while updating the signature to accept a PP, through which you can access the SM).

163

we already have access to SM from the members, can you do the same for PP (stash as a member during construction) rather than plumbing at each call?

clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp
351

is this part of the check even relevant? let's drop this completely?

356

maybe make this an UnorderedElementsAre, do we have any ordering guarantees in this API's contract?

The ReferencedFiles is designed to make add() as cheap as possible, and do any per-file logic after folding the fileIDs together.
This keeps that loop tighter and also isolates the complexity of the symbol vs file logic.
Any reason we can't do that here?

kbobyrev updated this revision to Diff 389131.Nov 23 2021, 2:19 AM
kbobyrev marked 5 inline comments as done.

Address review comments.

The ReferencedFiles is designed to make add() as cheap as possible, and do any per-file logic after folding the fileIDs together.
This keeps that loop tighter and also isolates the complexity of the symbol vs file logic.
Any reason we can't do that here?

Good point, thanks!

sammccall added inline comments.Nov 23 2021, 2:56 AM
clang-tools-extra/clangd/IncludeCleaner.cpp
252–267

This comment seems a bit unclear:

  • what the problem is you're solving
  • whether you're describing the behavior you *want* rather than the behavior you're trying to avoid.
  • which part of this is "filtering"
  • what the "later" option is you're contrasting to

Maybe something like:

// If a header is not self-contained, we consider its symbols a logical part of the including file.
// Therefore, mark the parents of all used non-self-contained FileIDs as used.
// Perform this on FileIDs rather than HeaderIDs, as each inclusion of a non-self-contained file is distinct.
262

it seems like we'd be better off storing the "is-self-contained" in the IncludeStructure and looking up the HeaderID here rather than asking the preprocessor. That way we rely on info that's better obtained at preamble build time.

clang-tools-extra/clangd/SourceCode.h
331

This helper checks e.g. for "don't include me", which is going to read source code of preamble files - we shouldn't do that, it's too slow and racy to do for every file in the preamble.

(It would be nice to handle those cases at some point, if we want to do that we need to do it at preamble build time and record the results in the IncludeStructure)

kbobyrev updated this revision to Diff 389183.Nov 23 2021, 6:19 AM
kbobyrev marked 2 inline comments as done.

Resolve most comments.

clang-tools-extra/clangd/IncludeCleaner.cpp
262

I am slightly confused: we don't really have the IncludeStructure here and it is logically detached from the IncludeStructure processing. We'd still have to unroll the chain of FIDs in here, so the only difference would be querying IncludeStructure data for the cached isSelfContainedHeader value, is that right? Why is obtaining that info at preamble build time better?

kadircet added inline comments.Nov 23 2021, 6:34 AM
clang-tools-extra/clangd/IncludeCleaner.cpp
262

the call-site has access to ParsedAST, hence the IncludeStructure. I believe the main reasoning behind Sam's suggestion is performing these lookups once while building the preamble and sharing it afterwards (we can even do the IO there).

clang-tools-extra/clangd/SourceCode.cpp
56

no need for static here (and other places below)

clang-tools-extra/clangd/SourceCode.h
333

i'd suggest AllowIO rather than ExpensiveCheck

clang-tools-extra/clangd/index/SymbolCollector.cpp
281–282

can you also delete this and the other helpers?

clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp
347

looks like a debug artifact

kbobyrev updated this revision to Diff 389454.Nov 24 2021, 3:48 AM
kbobyrev marked 6 inline comments as done.

Resolve review comments. This turned into a rather large refactoring change.

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2021, 3:48 AM
kbobyrev updated this revision to Diff 389457.Nov 24 2021, 4:11 AM

Untangle BeforeExecute call ordering change.

Thanks, this mostly LG now.
I'd consider splitting out the new infra change (IncludeStructure) from the feature change (include-cleaner treatment).
In case the latter causes some problems...

clang-tools-extra/clangd/Headers.cpp
92

nit: at the file exit time -> at file exit time

clang-tools-extra/clangd/Headers.h
66

Why redundantly track this on Inclusion? It's already available via IncludeStructure.SelfContained.contains(*HeaderID)

128

Maybe it's neater just to pass the CompilerInstance& - there's no other practical way to use this.

161

I think i'd prefer to see this private with an isSelfContained accessor, because the representation isn't the only sensible one.

Case in point: almost all headers in practice are self-contained, so in fact storing the set of non-self-contained headers seems preferable.

clang-tools-extra/clangd/SourceCode.cpp
53

these helpers are 1000 lines away from the only place they're used, please move them closer

clang-tools-extra/clangd/SourceCode.h
330

This still doesn't really describe the situation:

  • if you're not getting the header from a preamble, it won't have to read anything as it'll be in-memory already
  • if you are, then the fact that the content may have changed is at least as big a problem
  • the string scanning itself is not ideal to do redundantly from a performance POV

I'd rather say:
"This scans source code, and should not be called when using a preamble.
Prefer to access the cache in IncludeStructure::isSelfContained if you can."

kbobyrev updated this revision to Diff 389946.Nov 26 2021, 1:53 AM
kbobyrev marked 6 inline comments as done.

Address review comments. Next step: split this patch into two, as suggested.

kbobyrev planned changes to this revision.Nov 26 2021, 1:54 AM
kbobyrev added a reviewer: sammccall.
kbobyrev updated this revision to Diff 389947.Nov 26 2021, 1:56 AM

Leave the feature changes (IncludeCleaner bits) out of this patch

kbobyrev retitled this revision from [clangd] IncludeCleaner: Attribute symbols from non self-contained headers to their parents to [clangd] Record information about non self-contained headers in IncludeStructure.Nov 26 2021, 1:57 AM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev updated this revision to Diff 389950.Nov 26 2021, 2:03 AM

[clangd] IncludeCleaner: Attribute symbols from non self-contained headers to their parents

When a symbol comes from the non self-contained header, we recursively uplift
the file we consider used to the first includer that has a header guard. We
need to do this while we still have FileIDs because every time a non
self-contained header is included, it gets a new FileID but is later
deduplicated by HeaderID and it's not possible to understand where it was
included from.

kbobyrev updated this revision to Diff 389952.Nov 26 2021, 2:05 AM

Change base to D114370.

kbobyrev updated this revision to Diff 389953.Nov 26 2021, 2:07 AM

Get the changes back to this patch.

sammccall accepted this revision.Nov 26 2021, 3:33 AM

Thanks!

clang-tools-extra/clangd/Headers.h
167

could consider friend class RecordHeaders, either is ugly, up to you

This revision is now accepted and ready to land.Nov 26 2021, 3:33 AM
kbobyrev added inline comments.Nov 26 2021, 4:23 AM
clang-tools-extra/clangd/Headers.h
167

That was the original problem I had the public field in the previous iteration: RecordHeaders is defined in anonymous namespace, so IIUC I can not friend it from here unless I make it a subclass or somehow visible from here, otherwise it wouldn't work, would it? Is there any clean way to do the friend here?

sammccall added inline comments.Nov 26 2021, 4:54 AM
clang-tools-extra/clangd/Headers.h
167

No, you need to move it out of the anonymous namespace. Which is part of the ugliness, though not harmful in practice.

(If you're worried about namespace pollution you can make it a nested class of IncludeStructure. You have to forward declare it in the header. In that case you don't have to friend it)

kbobyrev updated this revision to Diff 390024.Nov 26 2021, 5:12 AM
kbobyrev marked 2 inline comments as done.

Get rid of the setter in IncludeStructure.

clang-tools-extra/clangd/Headers.h
167

I see, thanks!

This revision was landed with ongoing or failed builds.Nov 26 2021, 5:13 AM
This revision was automatically updated to reflect the committed changes.