Page MenuHomePhabricator

[clang][lex] Remark for used header search paths
Needs RevisionPublic

Authored by jansvoboda11 on May 21 2021, 8:42 AM.

Details

Summary

For dependency scanning, it would be useful to collect header search paths (provided on command-line via -I and friends) that were actually used during preprocessing. This patch adds that feature to HeaderSearch along with a new remark that reports such paths as they get used.

Previous version of this patch tried to use the existing LookupFileCache to report used paths via HitIdx. That doesn't work for ComputeUserEntryUsage (which is intended to be called *after* preprocessing), because it indexes used search paths by the file name. This means the values get overwritten when the code contains #include_next.

Note that HeaderSearch doesn't use HeaderSearchOptions::UserEntries directly. Instead, InitHeaderSearch pre-processes them (adds platform-specific paths, removes duplicates, removes paths that don't exist) and creates DirectoryLookup instances. This means we need a mechanism for translating between those two. It's not possible to go from DirectoryLookup back to the original HeaderSearch, so InitHeaderSearch now tracks the relationships explicitly.

Depends on D102924.

Diff Detail

Event Timeline

jansvoboda11 requested review of this revision.May 21 2021, 8:42 AM
jansvoboda11 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 21 2021, 8:42 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I think this is looking good, but would like either @dexonsmith, @akyrtzi, or someone else familiar with this area to take a look. Only other comment I have is that the new functions you add should use the current LLVM naming convention.

I also wonder how we should handle other things that are found via include paths such as module maps.

Fix naming of new functions/variables

I also wonder how we should handle other things that are found via include paths such as module maps.

That's a good point. The logic for module map lookup is more complex than header lookup, so I'd be keen to tackle that in a follow-up patch.

dexonsmith requested changes to this revision.May 28 2021, 2:10 PM

Thanks for your patience! I thought I already looked at this last week.

This is looking close. Comments mostly inline, but a high level point: I think this should report system search paths.

I also wonder how we should handle other things that are found via include paths such as module maps.

That's a good point. The logic for module map lookup is more complex than header lookup, so I'd be keen to tackle that in a follow-up patch.

Yup, seems like something that could be added as a follow up (although probably using the same remark). Is there a good place to leave behind a FIXME?

clang/include/clang/Basic/DiagnosticLexKinds.td
427

I suggest, more simply, "search path used:" (drop the "user-provided"). If you think it's useful for information purposes to know whether it was a user or system search path, I'd suggest using a select (in that case, maybe you want to add an optional "framework" descriptor, and/or "header map" when applicable, etc.).

428

I think it'd be better to define a DiagGroup in clang/include/clang/Basic/DiagnosticGroups.td and use it here.

It might be nice for it to be more general, and not refer to headers, so we can reuse it for module maps, and/or potentially other remarks. Maybe, -Rsearch-paths?

clang/include/clang/Lex/HeaderSearch.h
164–165

I think this can just be a vector of unsigned, since the key is densely packed and counting from 0. You can use ~0U for a sentinel for the non-entries. Maybe it's not too important though.

clang/lib/Frontend/InitHeaderSearch.cpp
581–583

I don't see why we'd want to filter out system includes.

  • Users sometimes manually configure "system" search paths, and this remark is fairly special-purpose, so I'm not sure the distinction is interesting to preserve.
  • This remark is already going to be "noisy" and hit a few times on essentially every compilation. Adding the system paths that get hit won't change that much.
  • The motivation for the patch is to test the logic for detecting which search paths are used in isolation from the canonicalize-explicit-module-build-commands logic in clang-scan-deps. We need to know that the logic works for system search paths as well.
clang/test/Preprocessor/header-search-user-entries.c
1–4

Can we use -verify here? Or does that not work for remarks?

9–11

If -verify doesn't work (hopefully it does), I think you'll need to add some CHECK-NOT / CHECK-NEXT / etc. or else you're not testing for the absence of other diagnostics.

Please also test:

  • Framework search path (used vs. not used)
  • System search path (used vs. not used)
  • One search path described relative to -isysroot (used vs. not used)
  • Header map (matched and used vs. matched but not used vs. not matched -- for the middle case, I'm not sure what result we actually want)
  • A path only used via #if __has_include() (when it's not followed by an #include or #import)
  • A path only used via ObjC's #import

If one of them doesn't work and it's complex enough to separate into a follow up, I think that'd be fine, but please find somewhere to leave a FIXME.

This revision now requires changes to proceed.May 28 2021, 2:10 PM

Yup, seems like something that could be added as a follow up (although probably using the same remark). Is there a good place to leave behind a FIXME?

I'll put a FIXME at an appropriate place in the next revision.

clang/include/clang/Basic/DiagnosticLexKinds.td
427

I chose this spelling to distinguish user-provided vs default include paths. The default ones are not important for header search pruning in dependency scanner, as they always get generated in InitHeaderSearch::AddDefaultIncludePaths.

clang/include/clang/Lex/HeaderSearch.h
164–165

I'm not sure what's our approach on early micro-optimizations like this. I don't think it will have measurable impact on memory or runtime.

clang/lib/Frontend/InitHeaderSearch.cpp
581–583

I'm not sure what you mean by system includes. HeaderSearchOptions::UserEntries should contain all search paths provided through the command-line including -isystem and -isysroot: https://github.com/llvm/llvm-project/blob/97d234935f1514af128277943f30efc469525371/clang/lib/Frontend/CompilerInvocation.cpp#L2985-L3056.

"System header prefixes" only control whether a header found through DirectoryLookup should be marked as system.

clang/test/Preprocessor/header-search-user-entries.c
9–11

Good point, I'll work on improving the coverage.

dexonsmith added inline comments.Jun 1 2021, 12:05 PM
clang/include/clang/Basic/DiagnosticLexKinds.td
427

Ah, I thought that logic was all moved to the driver. Looking at AddDefaultIncludePaths, you're right that only some things have moved, and some haven't (yet)...

For example, it looks like the only logic remaining in -cc1 on Darwin is:

// All header search logic is handled in the Driver for Darwin.
if (triple.isOSDarwin()) {
  if (HSOpts.UseStandardSystemIncludes) {
    // Add the default framework include paths on Darwin.
    AddPath("/System/Library/Frameworks", System, true);
    AddPath("/Library/Frameworks", System, true);
  }
  return;
}

(We should probably clean that up and move it to the driver as well...)

I think -cc1 can be agnostic about the source of the search path (user, IDE, driver, etc.).

clang/include/clang/Lex/HeaderSearch.h
164–165

Heh, maybe with the principle that the small things add up (or maybe just out of habit), I probably tend too much toward making small things efficient when it's easy. I also personally find Vector<Optional> just as natural a map data structure as DenseMap, but with more obvious allocation / etc. But yeah, probably not important.

clang/lib/Frontend/InitHeaderSearch.cpp
581–583

Thanks for explaining; you're right, I was confused. (Maybe this should be renamed from UserEntries to CommandLineEntries if it has everything on the command-line?)

As I mentioned above, the distinction between driver-generated and -cc1-generated options is a bit murky (and moving in the direction of removing the latter), so I'm not sure it's meaningful to filter out -cc1-generated options.