This is an archive of the discontinued LLVM Phabricator instance.

[clang][deps] Override dependency and serialized diag files for modules
ClosedPublic

Authored by benlangmuir on Jul 8 2022, 12:20 PM.

Details

Summary

When building modules, override secondary outputs (dependency file, dependency targets, serialized diagnostic file) in addition to the pcm file path. This avoids inheriting per-TU command-line options that cause non-determinism in the results (non-deterministic command-line for the module build, non-determinism in which TU's .diag and .d files will contain the module outputs). In clang-scan-deps we infer whether to generate dependency or serialized diagnostic files based on an original command-line. In a real build system this should be modeled explicitly.

Diff Detail

Event Timeline

benlangmuir created this revision.Jul 8 2022, 12:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 12:20 PM
benlangmuir requested review of this revision.Jul 8 2022, 12:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 12:20 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Updates:

  • Made lookup of module outputs fallible. Not currently used by clang-scan-deps, but since the expectation is for a build system to provide these settings account for possibility of errors.
  • Attempt to fix windows path issue in test

This looks pretty nice. My only concern is the ad-hoc command line parsing.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
55

I'm curious whether you have encountered situations where being able to return an error from LookupModuleOutputs is useful.

clang/test/ClangScanDeps/preserved-args.c
1

I think we can delete this test at this point, since we check both interesting cases in generate-modules-path-args.c and removed-args.c.

If I remember correctly, this test was originally checking that the scanner does not change -MT and friends before the scan and doesn't leak that into the resulting command lines.

clang/tools/clang-scan-deps/ClangScanDeps.cpp
297

Is my understanding correct that this lambda of type const ModuleOutputOptions &(ModuleID) gets implicitly converted to llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>?

If so, I think it would be clearer to take the ModuleID by const ref here also and wrap the return type with Expected, to match the... expected function_ref type. WDYT?

397

I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways. Could we move this check somewhere where we have a properly constructed CompilerInvocation? I think we could do something like this in ModuleDeps::getCanonicalCommandLine:

if (!CI.getDiagnosticOpts().DiagnosticSerializationFile.empty())
  CI.getDiagnosticOpts().DiagnosticSerializationFile
        = LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile);
benlangmuir marked an inline comment as done.Jul 11 2022, 2:03 PM
benlangmuir added inline comments.
clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
55

I ended up backing this change out: it was motivated by a downstream libclang API change that I have now re-evaluated based on your other feedback to use a per-output callback instead of a single callback.

clang/tools/clang-scan-deps/ClangScanDeps.cpp
297

This was unintentional, I just missed these couple of places when I changed the API from ModuleID to const ModuleID &. Will fix, thanks!

397

Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these outputs into the module command-lines (since the set of callbacks would differ), but I suspect in practice that doesn't matter since you're unlikely to mix compilations that have and don't have serialized diagnostics. To be 100% sound, it will require adding the existence of the outputs to the module context hash (not the actual path, just whether there was a diag and/or d file at all). I will do the context hash change later if you're okay with it - there's nowhere to feed the extra info into getModuleHash right now, but I was already planning to change the hashing which will make it easier to do. If you think it's critical we could add a parameter to getModuleHash temporarily to handle it.

I liked your idea to make the callback per-output as well.

Updates per review

  • Switched to a per-output callback
  • Removed preserved-args.c test
  • Removed error handling that I no longer have a real use for
  • Only request .d and .diag paths if they were enabled in the original TU
jansvoboda11 accepted this revision.Jul 12 2022, 7:03 AM

LGTM! Thanks.

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
330–331

This is a bit unfortunate but I don't see a better alternative.

Ideally, we would compute the hash with the .d and .dia paths already filled in. That would ensure the command line we end up reporting to the build system really does have the context hash associated with the module. (We'd need to include every field set in getCanonicalCommandLine() too.) But for the path lookup, we already need some kind of (currently partial) context hash.

This revision is now accepted and ready to land.Jul 12 2022, 7:03 AM
benlangmuir added inline comments.Jul 12 2022, 8:34 AM
clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
330–331

The other things added in getCanonicalCommandLine are currently:

  • module map path - I'm planning to include this in my changes to the context hash since it is significant exactly how the path is spelled. This is already part of the implicit module build's notion of identity.
  • pcm paths - these are sound as long as we always get the same paths returned in the callback during a build (across separate builds it would be fine to change them as long as you're going to rebuild anything whose path changed, and anything downstream of that).