When building modules, override secondary outputs (dependency file, dependency targets, serialized diagnostic file) in addition to the pcm file path. This avoids inheriting per-TU command-line options that cause non-determinism in the results (non-deterministic command-line for the module build, non-determinism in which TU's .diag and .d files will contain the module outputs). In clang-scan-deps we infer whether to generate dependency or serialized diagnostic files based on an original command-line. In a real build system this should be modeled explicitly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Updates:
- Made lookup of module outputs fallible. Not currently used by clang-scan-deps, but since the expectation is for a build system to provide these settings account for possibility of errors.
- Attempt to fix windows path issue in test
This looks pretty nice. My only concern is the ad-hoc command line parsing.
clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h | ||
---|---|---|
55 | I'm curious whether you have encountered situations where being able to return an error from LookupModuleOutputs is useful. | |
clang/test/ClangScanDeps/preserved-args.c | ||
1 | I think we can delete this test at this point, since we check both interesting cases in generate-modules-path-args.c and removed-args.c. If I remember correctly, this test was originally checking that the scanner does not change -MT and friends before the scan and doesn't leak that into the resulting command lines. | |
clang/tools/clang-scan-deps/ClangScanDeps.cpp | ||
297 | Is my understanding correct that this lambda of type const ModuleOutputOptions &(ModuleID) gets implicitly converted to llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>? If so, I think it would be clearer to take the ModuleID by const ref here also and wrap the return type with Expected, to match the... expected function_ref type. WDYT? | |
397 | I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways. Could we move this check somewhere where we have a properly constructed CompilerInvocation? I think we could do something like this in ModuleDeps::getCanonicalCommandLine: if (!CI.getDiagnosticOpts().DiagnosticSerializationFile.empty()) CI.getDiagnosticOpts().DiagnosticSerializationFile = LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile); |
clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h | ||
---|---|---|
55 | I ended up backing this change out: it was motivated by a downstream libclang API change that I have now re-evaluated based on your other feedback to use a per-output callback instead of a single callback. | |
clang/tools/clang-scan-deps/ClangScanDeps.cpp | ||
297 | This was unintentional, I just missed these couple of places when I changed the API from ModuleID to const ModuleID &. Will fix, thanks! | |
397 | Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these outputs into the module command-lines (since the set of callbacks would differ), but I suspect in practice that doesn't matter since you're unlikely to mix compilations that have and don't have serialized diagnostics. To be 100% sound, it will require adding the existence of the outputs to the module context hash (not the actual path, just whether there was a diag and/or d file at all). I will do the context hash change later if you're okay with it - there's nowhere to feed the extra info into getModuleHash right now, but I was already planning to change the hashing which will make it easier to do. If you think it's critical we could add a parameter to getModuleHash temporarily to handle it. I liked your idea to make the callback per-output as well. |
Updates per review
- Switched to a per-output callback
- Removed preserved-args.c test
- Removed error handling that I no longer have a real use for
- Only request .d and .diag paths if they were enabled in the original TU
LGTM! Thanks.
clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp | ||
---|---|---|
334–335 | This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the hash with the .d and .dia paths already filled in. That would ensure the command line we end up reporting to the build system really does have the context hash associated with the module. (We'd need to include every field set in getCanonicalCommandLine() too.) But for the path lookup, we already need some kind of (currently partial) context hash. |
clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp | ||
---|---|---|
334–335 | The other things added in getCanonicalCommandLine are currently:
|
I'm curious whether you have encountered situations where being able to return an error from LookupModuleOutputs is useful.