This is an archive of the discontinued LLVM Phabricator instance.

[include-cleaner] Add PragmaIncludes which handles include-mapping pragmas.
ClosedPublic

Authored by hokein on Oct 17 2022, 5:44 AM.

Details

Summary

PragmaIncludes captures the progma-based header-mapping information, it is used
in the "Location => Header" step to determine the final spelling header for a
symbol (rather than the header directive).

Diff Detail

Event Timeline

hokein created this revision.Oct 17 2022, 5:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 5:44 AM
hokein requested review of this revision.Oct 17 2022, 5:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 5:44 AM

Post it early in order to get initial feedback on the APIs.It is not finished, I plan to implement the IWYU keep pragma and add unittest.

(It's a bit hard to evaluate this without the other pieces, but I suppose we need to start somewhere)

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Hooks.h
38 ↗(On Diff #468176)

IWYU pragmas requires a PP listener, we'll also need PP events for other purposes (determining which files are self-contained, finding #includes in the main file). Should these be the same listener and write into the same class?

The argument for fewer components is usability: the embedder (standalone tool, clangd, clang-tidy) is responsible for connecting these hooks to the compiler and feeding the results back into the analysis.

So the library is somewhat simpler to use if the number of parts is small.
We need (I think) at least one for the AST and one for the preprocessor, because they need to be connected at different parts of the lifecycle.

How much do we gain by splitting up further than that?

44 ↗(On Diff #468176)

At first glance, this signature doesn't look right: keep applies to directives, not to files.
e.g.

#include "foo.h" // keep
#include "foo.h"

the second should be removed

(it's a silly example, but in the past we've confused ourselves into doing IO or complicated string matching code for things that are about spelling).

I'd suggest some representation like the line number (where the # appears, not where the comment is), though the details probably matter less than the concept.

47 ↗(On Diff #468176)

FileEntry* rather than FileEntryRef - we don't have different mappings when we open the same file using different names.

(This seems like a nit here but Header will be something like variant<stdlib::Header, FileEntry*, ...> and inconsistency would be confusing)

53 ↗(On Diff #468176)

Why are these UniqueID rather than FileEntry*? Is the idea to try to use this in clangd?

We hadn't planned on doing this, because pushing clangd's weird requirements around not using FileEntry's to the library, ability to clone the data structure etc seemed pretty intrusive. Since we switched from name-based to UniqueID it might be doable...

clang-tools-extra/include-cleaner/lib/Hooks.cpp
24 ↗(On Diff #468176)

while we're adding this, we should probably support /* as well as //, it's allowed per spec (I didn't add it to clangd just to keep NFC)

42 ↗(On Diff #468176)

nit: there's no need to check Invalid, getCharacterData always returns a valid pointer and it will never have a spurious IWYU pragma

sammccall added inline comments.Oct 18 2022, 7:51 AM
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Hooks.h
38 ↗(On Diff #468176)

After discussing offline: I think we expect this to be reused in clangd so separating it from the bits that won't is useful.
I don't think the scope is exactly clear though: which of these should we bundle together?

  • IWYU pragmas related to headers (private, export)
  • IWYU pragmas related to the main file (keep/no_include)
  • other equivalent things like #pragma include_instead
  • tracking whether headers are self-contained
  • recording #include graph (not sure IWYU even needs this)
  • tracking macro occurrences in the main file
  • details of #include directives from the main file

I'm tempted to say 1/2/3/4 are in-scope (since they're all gathered from roughly the same source and needed for the same thing) and so PragmaIncludes seems a little too narrow. Suggest HeaderQuirks maybe?

53 ↗(On Diff #468176)

After discussing offline, using UniqueID to ensure the results are usable across FileManagers seems reasonable, but I think it needs to be documented.

hokein updated this revision to Diff 468688.Oct 18 2022, 1:55 PM
hokein marked 4 inline comments as done.
  • rebase
  • address comments
  • implement shouldKeep
  • add unittests
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Hooks.h
38 ↗(On Diff #468176)

My original plan was 1/2/3 (4 probably yes, as they mostly share the same thing).

As a personal preference, I like the name PragmaIncludes as it perfectly matches 1/2/3, but not 4; while the HeaderQuirks is probably a better umbrella, but it seems not quite self-contained. for now, I'd keep it as-is, but happy to change if you have a strong preference.

44 ↗(On Diff #468176)

right, good point!

53 ↗(On Diff #468176)

Added comments.

clang-tools-extra/include-cleaner/lib/Hooks.cpp
24 ↗(On Diff #468176)

added handling of /*

kadircet added inline comments.Oct 27 2022, 3:33 AM
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
35

triple slashes (all over the header)

41

i think this interface is OK, if we're planning to implement another adapter on top of this class and use this only as a repository. Then have the adapter provide a more convenient interface/data structures for types of queries we'll have during Location => Header mapping inside include cleaner.

As those will most likely look like getHeaders(SourceLocation)/getHeaders(string) -> vector<pair<Header, Signals>>, without really caring about the details an implementation detail being mapped to some other file, or getting exports of a certain header (as discussed this is the piece that would require a little bit more detailed design).

49

it feels a little bit unfortunate from layering perspective that this struct also needs to somehow have specialized knowledge about the main file, but also having the same comment handler logic both in here and inside a main file specific handler seems like too much duplication. So let's go with it in this way for now, but keep this quirk in mind so that we can decide on a better place in the future.

54

why do we need to expose this?

58

we should also keep headers that're marked as export.

clang-tools-extra/include-cleaner/lib/Record.cpp
64

nit: we can keep track of the current FileID using FileChanged/LexedFileChanged events (also gets rid of the isWrittenInMainFile call above/below).

LG from my side

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
58

export isn't in this patch.
I don't think the details of where this data comes belongs in the header, it's not important to either the data structure or the interface.

67

nit: naming a member after the keys rather than the values is confusing, consider IWYUPublic or just PublicMapping?

clang-tools-extra/include-cleaner/lib/Record.cpp
112

nit: we can inline these trivial implementations into the header

hokein updated this revision to Diff 471936.Oct 31 2022, 2:25 AM
hokein marked 6 inline comments as done.

address comments

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
41

yeah, the plan is that this class is a low-level piece, only providing the data extracted from pragmas, and we will build another layer on top of this class (this is my next step).

54

no needed, move it to private.

67

rename to IWYUPublic.

hokein retitled this revision from [include-cleaner] WIP: Add PragmaIncludes which handles include-mapping pragmas. to [include-cleaner] Add PragmaIncludes which handles include-mapping pragmas..Oct 31 2022, 2:26 AM
sammccall added inline comments.Oct 31 2022, 3:09 AM
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
41

My suggestion would be to see what it looks like coding directly against this first - it might be fine, or it might make it obvious that making this class a little higher level is a better choice.
Trying to place an abstraction around this kind of "bits and pieces" information is easy to get wrong ime

kadircet accepted this revision.Oct 31 2022, 4:05 AM

thanks, LG from my side as well, just some small tweaks for tests.

clang-tools-extra/include-cleaner/unittests/RecordTest.cpp
124

it'd be better to test with an include-directive that doesn't have a keep pragma attached

135

it'd be better to map this to a header that isn't part of the TU, to make sure we don't rely on a file entry being around for a particular spelling.

This revision is now accepted and ready to land.Oct 31 2022, 4:05 AM
hokein updated this revision to Diff 471995.Oct 31 2022, 7:15 AM
hokein marked 2 inline comments as done.

address comments around tests.

This revision was landed with ongoing or failed builds.Oct 31 2022, 7:21 AM
This revision was automatically updated to reflect the committed changes.