This is an archive of the discontinued LLVM Phabricator instance.

[include-cleaner] Record main-file macro occurences and includes
ClosedPublic

Authored by sammccall on Oct 25 2022, 5:25 PM.

Details

Summary

The occurrences are roots for finding used headers, like walkAST.
Includes are the targets we're matching used headers against.

Diff Detail

Event Timeline

sammccall created this revision.Oct 25 2022, 5:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:25 PM
sammccall requested review of this revision.Oct 25 2022, 5:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:25 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sammccall added inline comments.Oct 25 2022, 5:29 PM
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
123

in the prototype I reimplemented this function in clangd, but I expect we can just reuse the RecordedIncludes class instead, copying from clangd's includes list is cheap.

(That's one argument for putting this in a different header, which I can do already if you prefer)

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h
22–23

(this is already fixed at head)

Left a few initial comments, it looks roughly good to me (for the macro-usage case, I might miss some historical context there, I think we come to an agreement that what this patch proposes is the designed behavior).

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
123

that's an interesting idea, but we need to do it carefully because of FileEntry*, the Include structure has a filed of FileEntry*, it is not feasible for clangd to propagate it (clangd's Inclusion doesn't have it, instead it uses a HeaderID which is based on the fs::UniqueID to identify a physical file).

But I think this is not something we should worry about at the moment, and the current interfaces look quite good.

127

nit: worth a comment mentioning the unsigned is the index of All.

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h
122

nit: HashLoc seems clearer than Location.

And it looks like Location and Line feels somewhat redundant -- we can get the Line loc from the Location with SourceManager. But I think it is fair to hold both, Line is an important property of the include, which will probably widely used.

clang-tools-extra/include-cleaner/unittests/RecordTest.cpp
300

why using a lower case x here?

308

this is a redef error with the one defined in the header, I think it is not intended, rename it?

321

IIUC, ErrorOK is irrelevant, and should be removed.

sammccall updated this revision to Diff 470818.Oct 26 2022, 7:54 AM
sammccall marked 5 inline comments as done.

address comments

hokein accepted this revision.Oct 26 2022, 1:19 PM

looks good from my side.

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h
52

probably worth adding a comment: the order of the enumerators must match the the order of template arguments in Storage.

82

thinking more about it (no action required) -- this structure seems a natural good fit to be used in UsedSymbolCB in Analysis.h rather than using SourceLocation RefLoc, Symbol Target separately.

clang-tools-extra/include-cleaner/lib/Record.cpp
101

this should be hidden in anonymous namespace, right?

106

nit: virtual can be removed.

This revision is now accepted and ready to land.Oct 26 2022, 1:19 PM
kadircet added inline comments.Oct 27 2022, 2:14 AM
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
102

s/this/the main file/

123

right, let's keep it to the minimum now. we can extend/move-around later on

129

it feels like having a set of includes corresponding to the same fileentry, rather than a single one, feels subtle enough to deserve a comment.

the only case i can think of is, two logically different files resolving to same physical file (i.e. symlinks). these will actually have completely different spellings, and probably it'll make things hard when deciding which include to "keep" or "insert". are there other cases where we can have multiple includes corresponding to the same fileentry?

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h
41

what about just a stringref to name?

42

triple slashes

46

OOC: can we really have multiple macro identifiers defined at the same source location? I guess it's cheap to compare another pointer, but might be easier if we actually didn't make the identifierinfo part of the identity.

82

agreed. i'll adjust that with the refkind patch

121

i think it's worth leaving a comment here mentioning that this can be null (couldn't resolve, or maybe we shouldn't be caring, e.g. logical stl include).

even better, maybe we should actually have a Header here? that way this would be conceptual equivalent of SymbolReference, but for includes?

clang-tools-extra/include-cleaner/lib/Record.cpp
109

we can keep a include depth based on the Reason and bail out on other operations when depth is not 1 (getFileID on Loc here probably hits the SourceManager cache anyways, so might not be important, but i feel like it's better to not rely on that).

242

we might have different handles (fileentry*) to the same file (if we end up having multiple filemanagers in action, which is unlikely as things stand today, but still). so what about using uniqueids for comparison here?

AFAICT, filemanager won't have multiple fileenrtys with the same uniqueid anyway. so we wouldn't really lose any generality.

clang-tools-extra/include-cleaner/unittests/RecordTest.cpp
263

i know these look more like functions, but convention is to use PascalCase for matchers, so Spelled instead?

327

nit: ASSERT_FALSE(Recorded.MacroReferences.empty())

354

same as above, Line?

364

can you also add an include with duplicated spelling to test spelling to set of includes mapping?

sammccall updated this revision to Diff 471150.Oct 27 2022, 6:29 AM
sammccall marked 10 inline comments as done.

address comments

This revision was landed with ongoing or failed builds.Nov 8 2022, 6:20 AM
This revision was automatically updated to reflect the committed changes.

This commit causes build failure on https://lab.llvm.org/buildbot/#/builders/121/builds/24947 :

[43/635] Building CXX object tools/clang/tools/extra/include-cleaner/lib/CMakeFiles/obj.clangIncludeCleaner.dir/Types.cpp.o
FAILED: tools/clang/tools/extra/include-cleaner/lib/CMakeFiles/obj.clangIncludeCleaner.dir/Types.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/lib64/ccache/c++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/extra/include-cleaner/lib -I/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/lib -I/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang/include -Itools/clang/include -Iinclude -I/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/include -I/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG    -fno-exceptions -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/clang/tools/extra/include-cleaner/lib/CMakeFiles/obj.clangIncludeCleaner.dir/Types.cpp.o -MF tools/clang/tools/extra/include-cleaner/lib/CMakeFiles/obj.clangIncludeCleaner.dir/Types.cpp.o.d -o tools/clang/tools/extra/include-cleaner/lib/CMakeFiles/obj.clangIncludeCleaner.dir/Types.cpp.o -c /home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/lib/Types.cpp
In file included from /home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/lib/Types.cpp:9:
/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:84:10: error: declaration of ‘clang::include_cleaner::Symbol clang::include_cleaner::SymbolReference::Symbol’ [-fpermissive]
   Symbol Symbol;
          ^~~~~~
/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:51:8: error: changes meaning of ‘Symbol’ from ‘struct clang::include_cleaner::Symbol’ [-fpermissive]
 struct Symbol {
        ^~~~~~
/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/lib/Types.cpp: In function ‘llvm::raw_ostream& clang::include_cleaner::operator<<(llvm::raw_ostream&, const clang::include_cleaner::SymbolReference&)’:
/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/clang-tools-extra/include-cleaner/lib/Types.cpp:51:51: warning: ‘*’ in boolean context, suggest ‘&&’ instead [-Wint-in-bool-context]
                                /*Width=*/CHAR_BIT *

I was able to reproduce the failure and by reverting this commit locally it passed, can you please take a look? @sammccall

This commit causes build failure on https://lab.llvm.org/buildbot/#/builders/121/builds/24947 :
I was able to reproduce the failure and by reverting this commit locally it passed, can you please take a look? @sammccall

It looks like this was already fixed in
https://github.com/llvm/llvm-project/commit/d19ba74dee0b9ab553bd8a6ef5b67ff349f4bf13

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h
102

"this" as in the current object, as the data flow here isn't obvious. Changed to *this, clearer?

123

Good point. We could give Include a UniqueID instead I think. Let's defer this though.

123

that's an interesting idea, but we need to do it carefully because of FileEntry*, the Include structure has a filed of FileEntry*, it is not feasible for clangd to propagate it (clangd's Inclusion doesn't have it, instead it uses a HeaderID which is based on the fs::UniqueID to identify a physical file).

But I think this is not something we should worry about at the moment, and the current interfaces look quite good.

129

Sure:

#include "foo.h"
#include "bar.h"
#include "foo.h"

it isn't useful, but people write it (by mistake).
Maybe we want to clean it up, maybe we don't, but I don't see much reason to model it wrong on purpose - it doesn't really simplify anything, and may further confuse a confusing situation.

I can add a comment - what do you want it to say?
"There may be more than one #include of the same file" just seems to restate the obvious interpretation of the type here. (And this is the header, so it's not an ideal place to dump a bunch of implementation comments)

clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h
41

That's twice as big, and this is already the critical path for sizeof(SymbolReference) which me can accumulate a lot of.

I don't think it's critical, but I also don't see why StringRef is better. (Less indirection vs carries some semantics + lifetime information)

46

Probably not? Removed from comparison.
(I think it's *simpler* to include it in the comparison, but it doesn't matter much)

52

Done in the base patch. D136710

121

Done (comment.)

maybe we should actually have a Header here

A Header is conceptually something like a Matcher<Include>. If we put Header here then we're communicating what the reference is going to look like: #include "foo.h" is going to satisfy a symbol usage in /path/to/foo.h or one in a file with // IWYU pragma private: include "foo.h", which we don't know.
(And it could be both, which would no longer be expressible).

Technically you could store an Header that *always* has a FileEntry, but that just seems like obfuscation.

122

Agree, we could one of these but it's a bit sad: we need SourceLocation to diagnose things, but Line is much more convenient the rest of the time.

Renamed, leaving this open to see what Kadir thinks too.

clang-tools-extra/include-cleaner/lib/Record.cpp
101

Right! This used to be friended.

109

That sounds complicated, wait until it shows up in a profile?

The conceptual stack definitely isn't properly maintained in all situations. e.g. a normal parse leaks 1 entry for... builtins or something, and clangd's parse with preamble leaks 3. I'm not specifically sure there's a problem in this scenario, but it's *not* a trivial thing that obviously can't go wrong.

(& I think if we're afraid to rely on the cache for the file we're *currently lexing* then we should seriously consider fixing the cache instead)

242

This is approximately the same issue as above (> that's an interesting idea, but we need to do it carefully because of FileEntry*`).
I agree and it doesn't really add code complexity but it does add subtlety, especially if we use FileEntry in the data structure and UniqueID for matching.
So I think deferring *all* the multi-file-manager concerns makes sense (or handling them all now if you prefer)

clang-tools-extra/include-cleaner/unittests/RecordTest.cpp
263

These are functions - I don't think there's a convention here, just confusion and bugs.
OK to fix named intead?

321

Hmm, it's not irrelevant, it means I get to make lots of mistakes in my examples :-)

Removed.

327

Why? This means we get "expected empty() is false, got true" instead of "expected size() is 0, got 7", which seems strictly worse...

Added a somewhat-useful printer for SymbolReference and switched to IsEmpty() instead.

(and sorry, it seems I was sitting on a pile of comments I thought I'd sent, LMK if I should follow up on them)

(and sorry, it seems I was sitting on a pile of comments I thought I'd sent, LMK if I should follow up on them)

no looks good. the only meaty discussion was fileentry to multiple includes/files related ones, and i guess deferring them is fine.