This is an archive of the discontinued LLVM Phabricator instance.

[clang][lex] Remark on search path usage
ClosedPublic

Authored by jansvoboda11 on May 21 2021, 8:42 AM.

Details

Summary

For dependency scanning, it would be useful to collect header search paths (provided on command-line via -I and friends) that were actually used during preprocessing. This patch adds that feature to HeaderSearch along with a new remark that reports such paths as they get used.

Previous version of this patch tried to use the existing LookupFileCache to report used paths via HitIdx. That doesn't work for ComputeUserEntryUsage (which is intended to be called *after* preprocessing), because it indexes used search paths by the file name. This means the values get overwritten when the code contains #include_next.

Note that HeaderSearch doesn't use HeaderSearchOptions::UserEntries directly. Instead, InitHeaderSearch pre-processes them (adds platform-specific paths, removes duplicates, removes paths that don't exist) and creates DirectoryLookup instances. This means we need a mechanism for translating between those two. It's not possible to go from DirectoryLookup back to the original HeaderSearch, so InitHeaderSearch now tracks the relationships explicitly.

Depends on D111557.

Diff Detail

Event Timeline

jansvoboda11 requested review of this revision.May 21 2021, 8:42 AM
jansvoboda11 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 21 2021, 8:42 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I think this is looking good, but would like either @dexonsmith, @akyrtzi, or someone else familiar with this area to take a look. Only other comment I have is that the new functions you add should use the current LLVM naming convention.

I also wonder how we should handle other things that are found via include paths such as module maps.

Fix naming of new functions/variables

I also wonder how we should handle other things that are found via include paths such as module maps.

That's a good point. The logic for module map lookup is more complex than header lookup, so I'd be keen to tackle that in a follow-up patch.

dexonsmith requested changes to this revision.May 28 2021, 2:10 PM

Thanks for your patience! I thought I already looked at this last week.

This is looking close. Comments mostly inline, but a high level point: I think this should report system search paths.

I also wonder how we should handle other things that are found via include paths such as module maps.

That's a good point. The logic for module map lookup is more complex than header lookup, so I'd be keen to tackle that in a follow-up patch.

Yup, seems like something that could be added as a follow up (although probably using the same remark). Is there a good place to leave behind a FIXME?

clang/include/clang/Basic/DiagnosticLexKinds.td
450

I suggest, more simply, "search path used:" (drop the "user-provided"). If you think it's useful for information purposes to know whether it was a user or system search path, I'd suggest using a select (in that case, maybe you want to add an optional "framework" descriptor, and/or "header map" when applicable, etc.).

451

I think it'd be better to define a DiagGroup in clang/include/clang/Basic/DiagnosticGroups.td and use it here.

It might be nice for it to be more general, and not refer to headers, so we can reuse it for module maps, and/or potentially other remarks. Maybe, -Rsearch-paths?

clang/include/clang/Lex/HeaderSearch.h
175–176

I think this can just be a vector of unsigned, since the key is densely packed and counting from 0. You can use ~0U for a sentinel for the non-entries. Maybe it's not too important though.

clang/lib/Frontend/InitHeaderSearch.cpp
581–583

I don't see why we'd want to filter out system includes.

  • Users sometimes manually configure "system" search paths, and this remark is fairly special-purpose, so I'm not sure the distinction is interesting to preserve.
  • This remark is already going to be "noisy" and hit a few times on essentially every compilation. Adding the system paths that get hit won't change that much.
  • The motivation for the patch is to test the logic for detecting which search paths are used in isolation from the canonicalize-explicit-module-build-commands logic in clang-scan-deps. We need to know that the logic works for system search paths as well.
clang/test/Preprocessor/header-search-user-entries.c
1–4 ↗(On Diff #348518)

Can we use -verify here? Or does that not work for remarks?

9–11 ↗(On Diff #348518)

If -verify doesn't work (hopefully it does), I think you'll need to add some CHECK-NOT / CHECK-NEXT / etc. or else you're not testing for the absence of other diagnostics.

Please also test:

  • Framework search path (used vs. not used)
  • System search path (used vs. not used)
  • One search path described relative to -isysroot (used vs. not used)
  • Header map (matched and used vs. matched but not used vs. not matched -- for the middle case, I'm not sure what result we actually want)
  • A path only used via #if __has_include() (when it's not followed by an #include or #import)
  • A path only used via ObjC's #import

If one of them doesn't work and it's complex enough to separate into a follow up, I think that'd be fine, but please find somewhere to leave a FIXME.

This revision now requires changes to proceed.May 28 2021, 2:10 PM

Yup, seems like something that could be added as a follow up (although probably using the same remark). Is there a good place to leave behind a FIXME?

I'll put a FIXME at an appropriate place in the next revision.

clang/include/clang/Basic/DiagnosticLexKinds.td
450

I chose this spelling to distinguish user-provided vs default include paths. The default ones are not important for header search pruning in dependency scanner, as they always get generated in InitHeaderSearch::AddDefaultIncludePaths.

clang/include/clang/Lex/HeaderSearch.h
175–176

I'm not sure what's our approach on early micro-optimizations like this. I don't think it will have measurable impact on memory or runtime.

clang/lib/Frontend/InitHeaderSearch.cpp
581–583

I'm not sure what you mean by system includes. HeaderSearchOptions::UserEntries should contain all search paths provided through the command-line including -isystem and -isysroot: https://github.com/llvm/llvm-project/blob/97d234935f1514af128277943f30efc469525371/clang/lib/Frontend/CompilerInvocation.cpp#L2985-L3056.

"System header prefixes" only control whether a header found through DirectoryLookup should be marked as system.

clang/test/Preprocessor/header-search-user-entries.c
9–11 ↗(On Diff #348518)

Good point, I'll work on improving the coverage.

dexonsmith added inline comments.Jun 1 2021, 12:05 PM
clang/include/clang/Basic/DiagnosticLexKinds.td
450

Ah, I thought that logic was all moved to the driver. Looking at AddDefaultIncludePaths, you're right that only some things have moved, and some haven't (yet)...

For example, it looks like the only logic remaining in -cc1 on Darwin is:

// All header search logic is handled in the Driver for Darwin.
if (triple.isOSDarwin()) {
  if (HSOpts.UseStandardSystemIncludes) {
    // Add the default framework include paths on Darwin.
    AddPath("/System/Library/Frameworks", System, true);
    AddPath("/Library/Frameworks", System, true);
  }
  return;
}

(We should probably clean that up and move it to the driver as well...)

I think -cc1 can be agnostic about the source of the search path (user, IDE, driver, etc.).

clang/include/clang/Lex/HeaderSearch.h
175–176

Heh, maybe with the principle that the small things add up (or maybe just out of habit), I probably tend too much toward making small things efficient when it's easy. I also personally find Vector<Optional> just as natural a map data structure as DenseMap, but with more obvious allocation / etc. But yeah, probably not important.

clang/lib/Frontend/InitHeaderSearch.cpp
581–583

Thanks for explaining; you're right, I was confused. (Maybe this should be renamed from UserEntries to CommandLineEntries if it has everything on the command-line?)

As I mentioned above, the distinction between driver-generated and -cc1-generated options is a bit murky (and moving in the direction of removing the latter), so I'm not sure it's meaningful to filter out -cc1-generated options.

Rebase, handle module maps, add more tests

I believe I've addressed all suggestions provided so far. @dexonsmith & @Bigcheese, can you check again?

clang/test/Preprocessor/header-search-user-entries.c
9–11 ↗(On Diff #348518)

I tried using -verify together with expected-remark-re (to match relative paths) and @* (to match any/no line number), but those don't seem to work together.

This expectation:

// expected-remark-re @* {{search path used: '{{.*}}/search-path-usage/FwA'}}

produces:

error: 'error' diagnostics seen but not expected: 
  File /Users/Jan/Code/upstream-llvm/clang/test/Preprocessor/search-path-usage-headers.m Line 30: cannot find start ('{{') of expected regex
error: 'remark' diagnostics seen but not expected: 
  (frontend): search path used: '/Users/Jan/Code/upstream-llvm/clang/test/Preprocessor/Inputs/search-path-usage/FwA'
2 errors generated.

If I drop @*, stuff doesn't work (presumably because of line numbers):

error: 'remark' diagnostics expected but not seen: 
  File /Users/Jan/Code/upstream-llvm/clang/test/Preprocessor/search-path-usage-headers.m Line 30: search path used: '{{.*}}/search-path-usage/FwA'
error: 'remark' diagnostics seen but not expected: 
  (frontend): search path used: '/Users/Jan/Code/upstream-llvm/clang/test/Preprocessor/Inputs/search-path-usage/FwA'
2 errors generated.

Let me know if I'm doing something wrong. In the meantime, I made the FileCheck checks more thorough as you suggested.

jansvoboda11 retitled this revision from [clang][lex] Remark for used header search paths to [clang][lex] Remark on search path usage.Oct 8 2021, 8:36 AM
jansvoboda11 added inline comments.Oct 8 2021, 8:39 AM
clang/test/Preprocessor/header-search-user-entries.c
9–11 ↗(On Diff #348518)
  • Header map (matched and used vs. matched but not used vs. not matched -- for the middle case, I'm not sure what result we actually want)

Can you clarify what you mean by "matched" and "used" exactly?

dexonsmith added inline comments.Oct 8 2021, 1:08 PM
clang/test/Preprocessor/header-search-user-entries.c
9–11 ↗(On Diff #348518)

Oh, I hadn't realized the diagnostic doesn't have a source location. Is there a reason it has no location right now?

Seems like pointing at the first #include that triggers it would be ideal, showing the include stack if there is one.

9–11 ↗(On Diff #348518)
  • Header map (matched and used vs. matched but not used vs. not matched

Can you clarify what you mean by "matched" and "used" exactly?

Assume "hmap" is a header map with the edge a.h -> b.h.

  • "Matched" means the header map is queried with "a.h" (or "A.H", or whatever).
  • "Used" means "b.h" is subsequently found on disk.

I think the way header maps work, a "matched but not used" would typically result in a compile error... since once the path is converted to "b.h" there's no going back. But here's a scenario would it wouldn't be an error:

#if __has_include("a.h")
#include "a.h"
#endif

The __has_include("a.h") would match in the header map, converting to "b.h", but there's no error when "b.h" isn't found on disk.

for the middle case, I'm not sure what result we actually want

Thinking about it more, I think we need it to count as a use of the header map whenever there's a match (regardless of what happens downstream after the path replacement), because it can affect the result.

With this setup:

% find . -type f
include/a.h
include/c.h
t.c
hmap
% clang -Ihmap -Iinclude -E t.c

(same header map, a.h -> b.h)

Here's an example of the "middle" case, matched but not used, where by my logic above the header map needs a remark:

% cat t.c
#include "a.h"

... because removing the -Ihmap changes the result of the compilation.

Here's the third case, not matched, where the remark shouldn't trigger since -Ihmap does not affect the compilation:

% cat t.c
#include "c.h"
dexonsmith added inline comments.Oct 8 2021, 1:18 PM
clang/test/Preprocessor/header-search-user-entries.c
9–11 ↗(On Diff #348518)

... my point being that it should be easy to specify the location, and then you're not having to combine @* with -re:

#include "some-header" // \
  // expected-remark-re {{source path used '{{.*}}/search-path-used/FwA'}} \
  // expected-remark {{some other diagnostic on the same line}}
#include "some-other-header"
  // expected-remark-re @-1 {{source path used '{{.*}}/search-path-used/FwA'}}

For the diagnostics in the headers, which might be included in other tests, probably simplest to add a marker:

// In the x.h
#include "y.h" // #x-include-y

// In the test
// expected-remark-re @#x-include-y {{...}
#include "x.h" // expected-note {{...}}

but you could also use -verify=name-of-test and use name-of-test-remark in x.h to avoid affecting other tests.

Add SourceLocation to remarks and test with -verify

dexonsmith accepted this revision.Oct 11 2021, 12:39 PM

Thanks; I think this was worth it. I like how clean it looks with -verify in place.

LGTM if you add a couple more header map tests, explained inline.

clang/test/Preprocessor/search-path-usage-headers.m
78–91 ↗(On Diff #378702)

This is a good positive test, but I don't see a negative test or the "matched but not ultimately found" case I mentioned before.

You can add a RUN like this for the negative test:

// RUN: %clang_cc1 -Eonly %s -Rsearch-path-usage \
//      -I%t/b.hmap                              \
//      -I%S/Inputs/search-path-usage/d          \
//      -DHMAP_NO_MATCH -verify
#ifdef HMAP_NO_MATCH
#include "d.h" // expected-remark {{search path used: ../d'}}
#endif

(same hmap, but not found in the headermap)

and like this for the "matched in header map but not ultimately found" case (different hmap, pointing at a file that won't be found):

// RUN: sed "s|DIR|%/S/Inputs/search-path-usage/missing-subdir|g" \
// RUN:             %S/Inputs/search-path-usage/b.hmap.json.template > %t/b-missing.hmap.json
// RUN: %hmaptool write %t/b-missing.hmap.json %t/b-missing.hmap
// RUN: %clang_cc1 -Eonly %s -Rsearch-path-usage \
// RUN:   -I %t/b-missing.hmap                   \
// RUN:   -I b                                   \
// RUN:   -DHMAP_MATCHED_BUT_MISSING -verify
#ifdef HMAP_MATCHED_BUT_MISSING
#include "b.h" // \
  // expected-remark {{search path used: ../b-missing.hmap}} \
  // expected-error {{file not found: "b.h" [or whatever it is in the end]}}
#endif

(Maybe would be good to have a __has_include-matched-but-missing test as well? I'll leave it up to you.)

This revision is now accepted and ready to land.Oct 11 2021, 12:39 PM

Report matched header maps even when target file does not exist, add tests.

Thanks for the review, I handled the header map case and added tests. I'll commit once the CI passes.

clang/lib/Lex/HeaderSearch.cpp
473

Note: Here's the header map special case for when a file matched but does not exist.

This revision was automatically updated to reflect the committed changes.
teemperor reopened this revision.Oct 12 2021, 1:21 PM
teemperor added a subscriber: teemperor.

This seems to have broken compiling some versions of Foundation/Dispatch with -fmodules enabled (which breaks the LLDB test suite when it tries to compile any Objective-C tests). Can you take a look and/or revert? I mailed you the version/compiler errors.

This revision is now accepted and ready to land.Oct 12 2021, 1:21 PM

This seems to have broken compiling some versions of Foundation/Dispatch with -fmodules enabled (which breaks the LLDB test suite when it tries to compile any Objective-C tests). Can you take a look and/or revert? I mailed you the version/compiler errors.

FYI, Jan is in Europe; if this is blocking LLDB I suggest reverting and he can evaluate tomorrow. (I'm a bit surprised this commit triggers a failure anywhere since it should be NFC aside from remarks, but I guess there must be a logic issue.)