This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Record IWYU pragma keep in the IncludeStructure
ClosedPublic

Authored by kbobyrev on Nov 17 2021, 2:06 AM.

Details

Summary

This will allow the IncludeCleaner to suppress warnings on the lines with "IWYU
pragma: keep".

Clang APIs are not very convinient, so the code has to navigate around it.

Diff Detail

Event Timeline

kbobyrev created this revision.Nov 17 2021, 2:06 AM
kbobyrev requested review of this revision.Nov 17 2021, 2:06 AM
kbobyrev edited reviewers, added: kadircet; removed: sammccall.Nov 26 2021, 3:01 AM
kbobyrev added a subscriber: sammccall.

It's annoying that we see comments and inclusion directives out-of-order, we can try fixing it on the parser side (I think it is incidental that these are issued in that order currently, they are eagerly trying to generate a fix/diagnostic for tokens after a pp-directive. hence we can first issue the PP callback, and then diagnose for rest of the line instead).
But I don't think it matters in the long run especially when we want to handle different types of pragmas in the future (which is the plan IIRC), they might not even necessarily be in the main file, let alone being seen with right sequencing.

So I'd suggest just building a side table information about these interesting comments/pragmas we see as we go, and then merge them with rest of the information we've built inside RecordHeaders::EndOfMainFile.
That'll enable us to separate concerns here. Instead of piggy backing CommentHandler on RecordHeaders, we can have a separate comment handler exposed via IncludeStructure::commentHandler() (similar to collect) and let it update this side table of pragma/comment information.

WDYT?

Sorry for forgetting about this one. Hopefully I can still help now by disagreeing with Kadir and creating an awkward stalemate instead.

I agree that the ownership situation is ugly, but I think a single object here is still simpler. Partly because the ordering isn't some weird coincidence (see below)

clang-tools-extra/clangd/Headers.cpp
111

I don't think this has anything to do with ownership or registration order, rather events get recorded in order that constructs *finish* rather than start.

The PP doesn't call InclusionDirective until it sees the end of the line, and it has to skip over the comment to get there.

(I like the example and the sequencing, I'd just replace the first paragraph with "Given:" or so)

clang-tools-extra/clangd/Headers.h
132

Alternatively we could just give up on having a pure-ish API and say void collect(Preprocessor&).
Again there's not really any other sensible way to use it.

Didn't really address this part

But I don't think it matters in the long run especially when we want to handle different types of pragmas in the future
So I'd suggest just building a side table information

That's a more general approach for sure. But it does seem there's some complexity: what is the data structure and how do we merge it back into the graph. (e.g. are we going to record line numbers of each #include in the graph for this merging?)

So let's think about the concrete cases:

  • export
  • begin_exports...end_exports
  • private maybe?

The TL;DR is that each of these either stand alone, or we see the directive before the inclusion it describes in a way that's easy to keep track of incrementally. So I'm not really seeing much implementation difficulty to justify trying to move the directives and inclusion processing apart.

Export seems basically similar. We're going to see the pragma immediately before the directive.
The same stateful solution seems just as clear, I don't see any problem with it not being in the main file.
(There's the "same line" issue of having to look up line tables for everything, maybe we can dodge it by comparing SourceLocations in raw form instead, or use the statefulness to avoid checks in most cases)

Begin/end is a bit different. The stateful solution looks like: keep a Set<FileID> where exports are turned on, mutate it when you see a directive, and check it when you seen an inclusion.
The side-table solution is a bit more complicated: a map of fileIDs to a list of ranges or something. Manageable for sure.

Private I think we have to treat as a standalone directive that mutates the current file, and the reference to another file is treated as a string. It's not a file the PP has necessarily seen. So the result is a map<file, string> and it seems adequate for this to be written straight into IncludeStructure.

clang-tools-extra/clangd/Headers.cpp
127

This code will run for every comment in the preamble, so we should probably be careful about its performance.

I'd guess the cheapest thing is:

  • check that Range.begin() is not a macro
  • SM.getCharacterData(Range.begin()) yields a char* that's guaranteed null-terminated
  • try to consume the IWYU pragma: prefix

This doesn't deal with makeFileCharRange, touch End at all, etc.

Sorry for forgetting about this one. Hopefully I can still help now by disagreeing with Kadir and creating an awkward stalemate instead.

Haha, always welcome!

Partly because the ordering isn't some weird coincidence (see below)

Right, I suppose it makes sense when you look at the "construct being finalized" perspective.

That's a more general approach for sure. But it does seem there's some complexity: what is the data structure and how do we merge it back into the graph. (e.g. are we going to record line numbers of each #include in the graph for this merging?)

Definitely, but we are not going to get rid of that extra complexity no matter which solution we go with. We still need to store some extra state about these comments, the only difference is when we process them & whether we store just the latest or the whole set.
Surely processing them in a different callback then InclusionDirective would add extra complexity, but I think it might also be simpler in general since we don't need to think about all the sequencing of these operations.
But after looking at the particular cases, I think I agree with your suggestion overall. Relying on the current sequencing to keep the state incremental rather than whole history sounds like the better trade off here. (details below).

So let's think about the concrete cases:

  • export
  • begin_exports...end_exports
  • private maybe?

The TL;DR is that each of these either stand alone, or we see the directive before the inclusion it describes in a way that's easy to keep track of incrementally. So I'm not really seeing much implementation difficulty to justify trying to move the directives and inclusion processing apart.

Yeah these are the cases I had in mind, but didn't really think about them thoroughly.
I guess in all of the export cases we just need to operate with the information we've seen before a particular inclusion directive, and the file including it. So there doesn't seem to be any need for postponing the processing. (Plus as you pointed out, keeping just the current state is definitely more simpler than keeping a set of those).

Private I think we have to treat as a standalone directive that mutates the current file, and the reference to another file is treated as a string. It's not a file the PP has necessarily seen. So the result is a map<file, string> and it seems adequate for this to be written straight into IncludeStructure.

Yup, I guess this one looks pretty much the same for both approaches.

kbobyrev updated this revision to Diff 390631.Nov 30 2021, 2:08 AM
kbobyrev marked an inline comment as done.

Change the API, update the patch. It's crashing ATM, investigating the
problems.

kbobyrev updated this revision to Diff 390639.Nov 30 2021, 2:37 AM
kbobyrev marked an inline comment as done.

Fix failures, address review comments.

clang-tools-extra/clangd/Headers.h
132

PrecompiledPreamble.cpp requires PPCallbacks being returned directly, so I think we can't easily have `void IncludeStructure::collect(PP &).

gentle ping

kadircet added inline comments.Dec 8 2021, 2:09 AM
clang-tools-extra/clangd/Headers.cpp
126

we're already keeping track of being inside builtinfile today, through FileChanged callbacks, can you also keep track of being inside main file rather than doing the check for each comment we hit?

149

maybe LastPragmaKeepInMainFileLine also mention in the comments that this is 0-indexed?

clang-tools-extra/clangd/Headers.h
132

as discussed offline let's do that inside BeforeExecute callback.

kbobyrev updated this revision to Diff 392688.Dec 8 2021, 2:18 AM
kbobyrev marked 3 inline comments as done.

Resolve almost all comments.

kbobyrev updated this revision to Diff 392696.Dec 8 2021, 3:02 AM
kbobyrev marked an inline comment as done.

Address the last comment.

kadircet accepted this revision.Dec 8 2021, 6:45 AM

thanks, LGTM! sorry for taking so long.

clang-tools-extra/clangd/Headers.cpp
139

maybe just Keeps track of include depth for the current file. It's 1 for main file. ?

202

s/move/std::move/g

clang-tools-extra/clangd/Headers.h
129–130

I suppose the comment needs updating too. (it no longer returns nor requires extra comment handler insertion).

Inserts a PPCallback and CommentHandler that'll populate this structure. ?

This revision is now accepted and ready to land.Dec 8 2021, 6:45 AM
kbobyrev updated this revision to Diff 392753.Dec 8 2021, 6:55 AM
kbobyrev marked 3 inline comments as done.

Resolve review comments.

This revision was landed with ongoing or failed builds.Dec 8 2021, 6:56 AM
This revision was automatically updated to reflect the committed changes.