This is an archive of the discontinued LLVM Phabricator instance.

[clang][deps] Include canonical invocation in ContextHash
ClosedPublic

Authored by benlangmuir on Jul 15 2022, 10:49 AM.

Details

Summary

The "strict context hash" is insufficient to identify module
dependencies during scanning, leading to different module build commands
being produced for a single module, and non-deterministically choosing
between them. This commit switches to hashing the canonicalized
CompilerInvocation of the module. By hashing the invocation we are
converting these from correctness issues to performance issues, and we
can then incrementally improve our ability to canonicalize
command-lines.

This change can cause a regression in the number of modules needed. Of
the 4 projects I tested, 3 had no regression, but 1, which was
clang+llvm itself, had a 66% regression in number of modules (4%
regression in total invocations). This is almost entirely due to
differences between -W options across targets. Of this, 25% of the
additional modules are system modules, which we could avoid if we
canonicalized -W options when -Wsystem-headers is not present --
unfortunately this is non-trivial due to some warnings being enabled in
system headers by default. The rest of the additional modules are mostly
real differences in potential warnings, reflecting incorrect behaviour
in the current scanner.

There were also a couple of differences due to -DFOO
-fmodule-ignore-macro=FOO, which I fixed here.

Since the output paths for the module depend on its context hash, we
hash the invocation before filling in outputs, and rely on the build
system to always return the same output paths for a given module.

Note: since the scanner itself uses an implicit modules build, there can
still be non-determinism, but it will now present as different
module+hashes rather than different command-lines for the same
module+hash.

Diff Detail

Event Timeline

benlangmuir created this revision.Jul 15 2022, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 10:49 AM
benlangmuir requested review of this revision.Jul 15 2022, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 10:49 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Would it make sense for this to replace the existing strict context hash implementation?

Would it make sense for this to replace the existing strict context hash implementation?

It's not clear to me whether this would be a good tradeoff or not: the explicit build canonicalizes its invocation, but the implicit build does not do so to the same extent. Some of that could be improved with some effort, but for example the optimizations we use for search path pruning in the explicit build cannot be done up-front in the implicit build, since we don't yet know which paths are relevant. I think we should consider this separately.

One necessary difference is that after scanning dependencies we want to hash the module dependencies (name + context hash), but we cannot do that in the implicit build since the dependencies are not discovered yet. The rest of the invocation hashing implementation could be shared though, if we decided it is the right approach.

Makes sense, thanks. LGTM provided you add brief explanation to each of the new tests.

jansvoboda11 accepted this revision.Jul 27 2022, 11:00 AM
This revision is now accepted and ready to land.Jul 27 2022, 11:00 AM
  • Added comments to tests
  • Added a missing test that I accidentally deleted
  • Rebased on latest main