This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules] Build inferred modules
AbandonedPublic

Authored by jansvoboda11 on Apr 21 2021, 1:14 AM.

Details

Summary

This patch enables explicitly building inferred modules.

Effectively a cherry-pick of https://github.com/apple/llvm-project/pull/699 authored by @Bigcheese with libclang changes omitted and adapted to use the new full canonical command-line clang-scan-deps produces since D100534.

Contains the following changes:

  1. [Clang][ScanDeps] Ignore __inferred_module.map dependency.
    • This shows up with inferred modules, but it doesn't exist on disk, so don't report it as a dependency.
  1. [Clang] Fix the header paths in clang::Module for inferred modules.
    • The UmbrellaAsWritten and NameAsWritten fields in clang::Module are a lie for framework modules. For those they actually are the path to the header or umbrella relative to the clang::Module::Directory.
    • The exception to this case is for inferred modules. Here it actually is the name as written, because we print out the module and read it back in when implicitly building modules. This causes a problem when explicitly building an inferred module, as we skip the printing out step.
    • In order to fix this issue this patch adds a new field for the path we want to use in getInputBufferForModule. It also makes NameAsWritten actually be the name written in the module map file (or that would be, in the case of an inferred module).
  1. [Clang] Allow explicitly building an inferred module.
    • Building the actual module still fails, but make sure it fails for the right reason.
  1. [Clang][ScanDeps] Use the module map a module was inferred from for inferred modules.

Diff Detail

Event Timeline

jansvoboda11 requested review of this revision.Apr 21 2021, 1:14 AM
jansvoboda11 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2021, 1:14 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Apply clang-format suggestions

jansvoboda11 edited the summary of this revision. (Show Details)Apr 21 2021, 1:26 AM

Rebase on top of D100942

jansvoboda11 planned changes to this revision.Apr 21 2021, 5:00 AM

Rebase on top of D101051

Fix clang-tidy warnings, add fixme

jansvoboda11 added inline comments.Apr 26 2021, 4:14 AM
clang/lib/Serialization/ASTReader.cpp
5623

Not sure what's the best way to write/read the two strings (NameAsWritten, PathRelativeToRootModuleDirectory). Can we put them into a single blob and separate them with \0?

clang/lib/Serialization/ASTReader.cpp
5623

I think we can store them in whatever format we want and separating them by '\0' should be fine.

However, I'm not convinced we need to do so in this patch. Deserialized NameAsWritten isn't being used anyway, meaning we have no interesting test case for serialization that would ensure it works as expected.

Given that there are four different things being done in this commit, it sounds naively like it should be four separate commits. If the logic is too intertwined to land each of the four pieces separately (certainly possible), can you quickly explain why, to motivate landing them together? (Or maybe there's an NFC refactoring that could be separated as a prep, to make the functional changes more isolated / easy to reason about?)

Either way, please split the testing between clang's ability to build inferred modules (clang/test/Modules) and clang-scan-deps ability to find them (clang/test/ClangScanDeps).

clang/test/ClangScanDeps/modules-inferred-explicit-build.m
11–13

I took me a while to figure out that .rsp and -to-rsp means "response file". Can we just use response (or even response-file?) in both cases, or is .rsp a well-known extension I'm just not aware of?

I'm also not sure we want to use this in a test (see my next comment). If it's a useful utility regardless perhaps it could have a home in clang/utils.

14–18

I feel like the clang invocations that build/use modules should be in clang/test/Modules. Two independent things:

  • Can clang build inferred modules explicitly? (tested in clang/test/Modules)
  • Can clang-scan-deps generate deps for inferred modules? (tested in clang/test/ClangScanDeps)
dexonsmith requested changes to this revision.May 3 2021, 10:37 AM
This revision now requires changes to proceed.May 3 2021, 10:37 AM

Given that there are four different things being done in this commit, it sounds naively like it should be four separate commits. If the logic is too intertwined to land each of the four pieces separately (certainly possible), can you quickly explain why, to motivate landing them together? (Or maybe there's an NFC refactoring that could be separated as a prep, to make the functional changes more isolated / easy to reason about?)

I think that landing everything together provides more context for people going through git log in the future.

If you really think splitting this up would be better, how about this?

  1. Fix AsWritten for umbrella headers/directories.
  2. Add tests to clang/test/Modules.
  3. Add tests to clang/test/ClangScanDeps, make the necessary changes to Tooling/DependencyScanning, add response file Python script.
clang/test/ClangScanDeps/modules-inferred-explicit-build.m
11–13

The Clang driver already has an option --rsp-quoting={posix,windows}, so I think the shortcut is somewhat known. I can rename the script to module-deps-to-response-file, if you think that's clearer. It could be useful in general, co moving this to clang/utils sounds fine.

14–18

I agree that we should test explicit build of inferred modules in clang/test/Modules (without clang-scan-deps). I'll look into it.

I'm not sure I'd be happy with only checking the dependencies produced by clang-scan-deps here. Testing that the generated command-line actually works is as important IMO, and it will be even more so when we start optimizing the command line (stripping out unused header search paths, definitions etc.).

Given that there are four different things being done in this commit, it sounds naively like it should be four separate commits. If the logic is too intertwined to land each of the four pieces separately (certainly possible), can you quickly explain why, to motivate landing them together? (Or maybe there's an NFC refactoring that could be separated as a prep, to make the functional changes more isolated / easy to reason about?)

I think that landing everything together provides more context for people going through git log in the future.

If you really think splitting this up would be better, how about this?

  1. Fix AsWritten for umbrella headers/directories.
  2. Add tests to clang/test/Modules.

It sounds like these two pieces could be done together (since the refactor for AsWritten seems like it deserves a test), as a commit that adds support for building inferred modules explicitly.

  1. Add tests to clang/test/ClangScanDeps, make the necessary changes to Tooling/DependencyScanning, add response file Python script.

Followed by this, for changing clang-scan-deps to support scanning for inferred modules.

WDYT?

clang/test/ClangScanDeps/modules-inferred-explicit-build.m
11–13

It's probably fine as rsp then, just a name I personally didn't know.

14–18

I'm not sure precisely what you mean by "works". I'm not sure just checking if it still compiles tells us much. What's important is that the modified options have the same semantics, and I think a subtle change that still compiles is more likely than preventing compilation entirely.

I don't think it would scale to check for semantic problems here -- that needs a body of testcases that covers all the things modules support.

One option would be to use the testcases (or a selection of them) in clang/test/Modules, by adding an extra RUN line that builds clang-scan-deps-discovered modules both with and without command-line stripping. For most changes, we can arrange the AST block such that skipped options won't affect it, and we could compare the hash of just that block. If and when we start stripping ignored -D options we'll need something smarter, but we can solve that problem later. (Ideally, this would just be a mode in clang, clang -Xclang -fmodules-depscan, which does an initial depscan and builds the modules in order. This might actually be an improvement on the existing implicit modules.)

@Bigcheese, maybe you can weigh in? If you both think clang -cc1 should be tested here, I'm open to it. (In that case, though, this should not be invoking %clang, but %clang_cc1, I think... or does the response file create a driver command-line?)

  1. Fix AsWritten for umbrella headers/directories.
  2. Add tests to clang/test/Modules.

It sounds like these two pieces could be done together (since the refactor for AsWritten seems like it deserves a test), as a commit that adds support for building inferred modules explicitly.

  1. Add tests to clang/test/ClangScanDeps, make the necessary changes to Tooling/DependencyScanning, add response file Python script.

Followed by this, for changing clang-scan-deps to support scanning for inferred modules.

WDYT?

That sounds good to me.

clang/test/ClangScanDeps/modules-inferred-explicit-build.m
14–18

I'm not sure precisely what you mean by "works". I'm not sure just checking if it still compiles tells us much. What's important is that the modified options have the same semantics, and I think a subtle change that still compiles is more likely than preventing compilation entirely.

I see, that's a good point.

One option would be to use the testcases (or a selection of them) in clang/test/Modules, by adding an extra RUN line that builds clang-scan-deps-discovered modules both with and without command-line stripping. For most changes, we can arrange the AST block such that skipped options won't affect it, and we could compare the hash of just that block. If and when we start stripping ignored -D options we'll need something smarter, but we can solve that problem later.

Using tests from clang/test/Modules sounds nice. What are your concerns regarding stripping -D options?

In that case, though, this should not be invoking %clang, but %clang_cc1, I think... or does the response file create a driver command-line?

The Tooling/DependencyScanning library already prepends the "-cc1" argument so that build tools using the API don't have to do that on their own.

dexonsmith added inline comments.May 5 2021, 6:51 AM
clang/test/ClangScanDeps/modules-inferred-explicit-build.m
14–18

Using tests from clang/test/Modules sounds nice. What are your concerns regarding stripping -D options?

Oh, it’s a bit mundane, but that’ll affect the identifier info, which will in turn change the AST block and its hash. I think?

For a truly explicit module, removing “unused” -D options from the identifier table might not be correct, since for semantics you might want importers to pick up those definitions (I think? Are they considered exported?). For implicitly-discovered modules, we know all transitive importers have a superset of -Ds on their command-lines so it’s safe.

Maybe what we can do at the time is turn on a flag to avoid writing unused -Ds to the serialized identifier table so the hashes will match? Anyway, I’m sure there is a solution.

dexonsmith added inline comments.May 5 2021, 7:14 AM
clang/test/ClangScanDeps/modules-inferred-explicit-build.m
14–18

In that case, though, this should not be invoking %clang, but %clang_cc1, I think... or does the response file create a driver command-line?

The Tooling/DependencyScanning library already prepends the "-cc1" argument so that build tools using the API don't have to do that on their own.

I suggest adding a .cc1 extension to the response files to make that clear to anyone reading the test. Maybe t.inferred.cc1.rsp.

The third clang invocation doesn't have the response file as the first argument, though, so I imagine that one doesn't contain a -cc1? (Or if it did, I'm not sure how it would work... I'm guessing it only adds arguments that happen to match between -cc1 and driver?)

jansvoboda11 added inline comments.May 5 2021, 9:02 AM
clang/test/ClangScanDeps/modules-inferred-explicit-build.m
14–18

I suggest adding a .cc1 extension to the response files to make that clear to anyone reading the test. Maybe t.inferred.cc1.rsp.

Fair enough.

The third clang invocation doesn't have the response file as the first argument, though, so I imagine that one doesn't contain a -cc1? (Or if it did, I'm not sure how it would work... I'm guessing it only adds arguments that happen to match between -cc1 and driver?)

That's correct, it only contains additional arguments. The input to clang-scan-deps is a compilation database, so the assumption there is that it will contain driver command lines. Therefore, we're generating driver arguments.

@Bigcheese, tests in your original PR invoke -cc1, what was the reasoning behind that?

jansvoboda11 abandoned this revision.May 14 2021, 6:02 AM

I split out the support for building inferred modules explicitly: D102491.

Patch with changes to the dependency scanner will follow.