This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Tooling/DependencyScanning/
-
Tooling/
-
DependencyScanning/
3/6
ModuleDepCollector.cpp
-
test/ClangScanDeps/
-
ClangScanDeps/
-
Inputs/modules-context-hash/
-
modules-context-hash/
-
a/
-
dep.h
-
b/
-
dep.h
-
cdb.json.template
-
mod.h
-
module.modulemap
-
tu.c
-
modules-context-hash.c

Differential D111720

[clang][deps] Ensure reported context hash is strict
ClosedPublic

Authored by jansvoboda11 on Oct 13 2021, 8:09 AM.

Download Raw Diff

Details

Reviewers

Bigcheese
dexonsmith

Commits

rG954d77b98dd6: [clang][deps] Ensure reported context hash is strict

Summary

One of main goals of the dependency scanner is to be strict about module compatibility. This is achieved through strict context hash. This patch ensures that strict context hash is enabled not only during the scan itself (and its minimized implicit build), but also when actually reporting the dependency.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jansvoboda11 requested review of this revision.Oct 13 2021, 8:09 AM

jansvoboda11 created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2021, 8:09 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B128627: Diff 379397.Oct 13 2021, 8:47 AM

dexonsmith added inline comments.Oct 13 2021, 6:58 PM

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
56–58	IIUC, explicit modules don't really have/need a context hash. Can related options be stripped out when serializing to `-cc1` when `ImplicitModules` is false? Basically, I'm asking if `ModulesStrictContextHash` is a no-op when `ImplicitModules` is false. If not, can we make it a no-op? (If we can, then maybe rename the field to `ImplicitModulesStrictContextHash` and audit that no one reads it when `ImplicitModules` is off...)

jansvoboda11 added inline comments.Oct 14 2021, 12:08 PM

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
56–58	Let me clarify this a bit. You're right that when building explicit modules, we don't care about context hash. We do care about using strict context hash during the scan though - it's an implementation detail through which we prevent mixing incompatible modules/TUs. (This strict context hash is enabled elsewhere in the dependency scanner.) At the end of the scan, we take discovered modules and modify/prune their `CompilerInvocation` (in this function). This can essentially "merge" multiple versions of the same module into one, which is very desirable. But we still want to do it according to the strict context hash. We don't want to merge versions with different search paths for example (non-strict context hash). That's what this change ensures. Note that we don't need to report context hashes to scanner clients. Any other identifier derived from a strict context hash would work.
56–58	I think the rename you're suggesting is valid. We could strip the `ModulesStrictContextHash` in the scanner: after we generate the strict context hash and before we generate the command-line. I think that can be done in a NFC follow-up.

LGTM, if you expand the comment (see inline).

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
56–58	I think I understand the problem now. I hadn't really put together that the implicit modules context hash machinery was being used to decide the artifact location for the explicit module. I'm concerned this is too subtle and fragile. I'm wondering if the following more naive solution would work: Prune/canonicalize the CompilerInvocation (as now). Write/modify any fields to use placeholders for fields the client has control over. (Besides `OutputFile`, what else is there?) Generate the `-cc1` and hash it. That's now the context hash. Return it to the client. But maybe that's the whole point of the strict context hash, and if there are bugs where this would behave differently, we should fix the strict context hash? ... Stepping back, please go ahead and commit this incremental improvement, after expanding the comment to: More fully explain the context: that the context hash will be generated from the CompilerInvocation and sent to the client, which then uses it to decide where to store the artifact. We need to make sure it's strict. Explain why `assert(ModulesStrictContextHash)` would fail, even though the scan used a strict context hash. (Maybe it'd makes sense to make that change in a follow-up...?)

This revision is now accepted and ready to land.Oct 19 2021, 1:18 PM

jansvoboda11 added inline comments.Oct 21 2021, 4:49 AM

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
56–58	I'm concerned this is too subtle and fragile. I'm wondering if the following more naive solution would work: Prune/canonicalize the CompilerInvocation (as now). Write/modify any fields to use placeholders for fields the client has control over. (Besides `OutputFile`, what else is there?) In general, we don't know much about the intended filesystem paths. Off the top of my head, we don't know the PCM output file (and values for `-fmodule-file=`), input file (and values for `-fmodule-map-file=`: module map used for building might have different location during the scan), serialized diagnostics file (we can't use the file for a TU that discovered the module first). Generate the `-cc1` and hash it. That's now the context hash. Return it to the client. But maybe that's the whole point of the strict context hash, and if there are bugs where this would behave differently, we should fix the strict context hash? I think that's a valid approach. Just wondering why you see it being less subtle and fragile than strict context hash?

Closed by commit rG954d77b98dd6: [clang][deps] Ensure reported context hash is strict (authored by jansvoboda11). · Explain WhyOct 21 2021, 4:49 AM

This revision was automatically updated to reflect the committed changes.

jansvoboda11 added a commit: rG954d77b98dd6: [clang][deps] Ensure reported context hash is strict.

dexonsmith added inline comments.Oct 21 2021, 4:25 PM

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
56–58	Clients like build systems don't want two different `-cc1`s for the same object. This would structurally guarantee that, whereas the strict context hash seems best-effort. Maybe the strict context hash should be generated this way? (Maybe it already is?)

Revision Contents

Path

Size

clang/

lib/

Tooling/

DependencyScanning/

ModuleDepCollector.cpp

8 lines

test/

ClangScanDeps/

Inputs/

modules-context-hash/

12 lines

1 line

1 line

1 line

modules-context-hash.c

89 lines

Diff 381213

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	CompilerInvocation ModuleDepCollector::makeInvocationForModuleBuildWithoutPaths(
CI.getLangOpts()->ImplicitModules = false;		CI.getLangOpts()->ImplicitModules = false;

// Report the prebuilt modules this module uses.		// Report the prebuilt modules this module uses.
for (const auto &PrebuiltModule : Deps.PrebuiltModuleDeps) {		for (const auto &PrebuiltModule : Deps.PrebuiltModuleDeps) {
CI.getFrontendOpts().ModuleFiles.push_back(PrebuiltModule.PCMFile);		CI.getFrontendOpts().ModuleFiles.push_back(PrebuiltModule.PCMFile);
CI.getFrontendOpts().ModuleMapFiles.push_back(PrebuiltModule.ModuleMapFile);		CI.getFrontendOpts().ModuleMapFiles.push_back(PrebuiltModule.ModuleMapFile);
}		}

Optimize(CI);		Optimize(CI);

		// The original invocation probably didn't have strict context hash enabled.
		dexonsmithUnsubmitted Not Done Reply Inline Actions IIUC, explicit modules don't really have/need a context hash. Can related options be stripped out when serializing to `-cc1` when `ImplicitModules` is false? Basically, I'm asking if `ModulesStrictContextHash` is a no-op when `ImplicitModules` is false. If not, can we make it a no-op? (If we can, then maybe rename the field to `ImplicitModulesStrictContextHash` and audit that no one reads it when `ImplicitModules` is off...) dexonsmith: IIUC, explicit modules don't really have/need a context hash. Can related options be stripped…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Let me clarify this a bit. You're right that when building explicit modules, we don't care about context hash. We do care about using strict context hash during the scan though - it's an implementation detail through which we prevent mixing incompatible modules/TUs. (This strict context hash is enabled elsewhere in the dependency scanner.) At the end of the scan, we take discovered modules and modify/prune their `CompilerInvocation` (in this function). This can essentially "merge" multiple versions of the same module into one, which is very desirable. But we still want to do it according to the strict context hash. We don't want to merge versions with different search paths for example (non-strict context hash). That's what this change ensures. Note that we don't need to report context hashes to scanner clients. Any other identifier derived from a strict context hash would work. jansvoboda11: Let me clarify this a bit. You're right that when building explicit modules, we don't care…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I think the rename you're suggesting is valid. We could strip the `ModulesStrictContextHash` in the scanner: after we generate the strict context hash and before we generate the command-line. I think that can be done in a NFC follow-up. jansvoboda11: I think the rename you're suggesting is valid. We //could// strip the…
		dexonsmithUnsubmitted Not Done Reply Inline Actions I think I understand the problem now. I hadn't really put together that the implicit modules context hash machinery was being used to decide the artifact location for the explicit module. I'm concerned this is too subtle and fragile. I'm wondering if the following more naive solution would work: Prune/canonicalize the CompilerInvocation (as now). Write/modify any fields to use placeholders for fields the client has control over. (Besides `OutputFile`, what else is there?) Generate the `-cc1` and hash it. That's now the context hash. Return it to the client. But maybe that's the whole point of the strict context hash, and if there are bugs where this would behave differently, we should fix the strict context hash? ... Stepping back, please go ahead and commit this incremental improvement, after expanding the comment to: More fully explain the context: that the context hash will be generated from the CompilerInvocation and sent to the client, which then uses it to decide where to store the artifact. We need to make sure it's strict. Explain why `assert(ModulesStrictContextHash)` would fail, even though the scan used a strict context hash. (Maybe it'd makes sense to make that change in a follow-up...?) dexonsmith: I think I understand the problem now. I hadn't really put together that the implicit modules…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I'm concerned this is too subtle and fragile. I'm wondering if the following more naive solution would work: Prune/canonicalize the CompilerInvocation (as now). Write/modify any fields to use placeholders for fields the client has control over. (Besides `OutputFile`, what else is there?) In general, we don't know much about the intended filesystem paths. Off the top of my head, we don't know the PCM output file (and values for `-fmodule-file=`), input file (and values for `-fmodule-map-file=`: module map used for building might have different location during the scan), serialized diagnostics file (we can't use the file for a TU that discovered the module first). Generate the `-cc1` and hash it. That's now the context hash. Return it to the client. But maybe that's the whole point of the strict context hash, and if there are bugs where this would behave differently, we should fix the strict context hash? I think that's a valid approach. Just wondering why you see it being less subtle and fragile than strict context hash? jansvoboda11: > I'm concerned this is too subtle and fragile. I'm wondering if the following more naive…
		dexonsmithUnsubmitted Not Done Reply Inline Actions Clients like build systems don't want two different `-cc1`s for the same object. This would structurally guarantee that, whereas the strict context hash seems best-effort. Maybe the strict context hash should be generated this way? (Maybe it already is?) dexonsmith: Clients like build systems don't want two different `-cc1`s for the same object. This would…
		// We will use the context hash of this invocation to distinguish between
		// multiple incompatible versions of the same module and will use it when
		// reporting dependencies to the clients. Let's make sure we're using
		// strict context hash in order to prevent accidental sharing of
		// incompatible modules (e.g. with differences in search paths).
		CI.getHeaderSearchOpts().ModulesStrictContextHash = true;

return CI;		return CI;
}		}

static std::vector<std::string>		static std::vector<std::string>
serializeCompilerInvocation(const CompilerInvocation &CI) {		serializeCompilerInvocation(const CompilerInvocation &CI) {
// Set up string allocator.		// Set up string allocator.
llvm::BumpPtrAllocator Alloc;		llvm::BumpPtrAllocator Alloc;
llvm::StringSaver Strings(Alloc);		llvm::StringSaver Strings(Alloc);
▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

clang/test/ClangScanDeps/Inputs/modules-context-hash/a/dep.h

This file was added.

This is an empty file.

clang/test/ClangScanDeps/Inputs/modules-context-hash/b/dep.h

This file was added.

This is an empty file.

clang/test/ClangScanDeps/Inputs/modules-context-hash/cdb.json.template

This file was added.

				[
				{
				"directory": "DIR",
				"command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/a -o DIR/tu_a.o",
				"file": "DIR/tu.c"
				},
				{
				"directory": "DIR",
				"command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/b -o DIR/tu_b.o",
				"file": "DIR/tu.c"
				}
				]

clang/test/ClangScanDeps/Inputs/modules-context-hash/mod.h

This file was added.

#include "dep.h"

clang/test/ClangScanDeps/Inputs/modules-context-hash/module.modulemap

This file was added.

module mod { header "mod.h" }

clang/test/ClangScanDeps/Inputs/modules-context-hash/tu.c

This file was added.

#include "mod.h"

clang/test/ClangScanDeps/modules-context-hash.c

This file was added.

				// RUN: rm -rf %t && mkdir %t
				// RUN: cp -r %S/Inputs/modules-context-hash/* %t

				// Check that the scanner reports the same module as distinct dependencies when
				// a single translation unit gets compiled with multiple command-lines that
				// produce different strict context hashes.

				// RUN: sed "s\|DIR\|%/t\|g" %S/Inputs/modules-context-hash/cdb.json.template > %t/cdb.json
				// RUN: echo -%t > %t/result.json
				// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full -j 1 >> %t/result.json
				// RUN: cat %t/result.json \| sed 's:\\\\\?:/:g' \| FileCheck %s -check-prefix=CHECK

				// CHECK: -[[PREFIX:.*]]
				// CHECK-NEXT: {
				// CHECK-NEXT: "modules": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "clang-module-deps": [],
				// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/module.modulemap",
				// CHECK-NEXT: "command-line": [
				// CHECK-NEXT: "-cc1"
				// CHECK: "-emit-module"
				// CHECK: "-I"
				// CHECK: "[[PREFIX]]/a"
				// CHECK: "-fmodule-name=mod"
				// CHECK: ],
				// CHECK-NEXT: "context-hash": "[[HASH_MOD_A:.*]]",
				// CHECK-NEXT: "file-deps": [
				// CHECK-NEXT: "[[PREFIX]]/a/dep.h",
				// CHECK-NEXT: "[[PREFIX]]/mod.h",
				// CHECK-NEXT: "[[PREFIX]]/module.modulemap"
				// CHECK-NEXT: ],
				// CHECK-NEXT: "name": "mod"
				// CHECK-NEXT: },
				// CHECK-NEXT: {
				// CHECK-NEXT: "clang-module-deps": [],
				// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/module.modulemap",
				// CHECK-NEXT: "command-line": [
				// CHECK-NEXT: "-cc1"
				// CHECK: "-emit-module"
				// CHECK: "-I"
				// CHECK: "[[PREFIX]]/b"
				// CHECK: "-fmodule-name=mod"
				// CHECK: ],
				// CHECK-NEXT: "context-hash": "[[HASH_MOD_B:.*]]",
				// CHECK-NEXT: "file-deps": [
				// CHECK-NEXT: "[[PREFIX]]/b/dep.h",
				// CHECK-NEXT: "[[PREFIX]]/mod.h",
				// CHECK-NEXT: "[[PREFIX]]/module.modulemap"
				// CHECK-NEXT: ],
				// CHECK-NEXT: "name": "mod"
				// CHECK-NEXT: }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "translation-units": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "clang-context-hash": "{{.*}}",
				// CHECK-NEXT: "clang-module-deps": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "context-hash": "[[HASH_MOD_A]]",
				// CHECK-NEXT: "module-name": "mod"
				// CHECK-NEXT: }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "command-line": [
				// CHECK-NEXT: "-fno-implicit-modules",
				// CHECK-NEXT: "-fno-implicit-module-maps"
				// CHECK-NEXT: ],
				// CHECK-NEXT: "file-deps": [
				// CHECK-NEXT: "[[PREFIX]]/tu.c"
				// CHECK-NEXT: ],
				// CHECK-NEXT: "input-file": "[[PREFIX]]/tu.c"
				// CHECK-NEXT: },
				// CHECK-NEXT: {
				// CHECK-NEXT: "clang-context-hash": "{{.*}}",
				// CHECK-NEXT: "clang-module-deps": [
				// CHECK-NEXT: {
				// CHECK-NEXT: "context-hash": "[[HASH_MOD_B]]",
				// CHECK-NEXT: "module-name": "mod"
				// CHECK-NEXT: }
				// CHECK-NEXT: ],
				// CHECK-NEXT: "command-line": [
				// CHECK-NEXT: "-fno-implicit-modules",
				// CHECK-NEXT: "-fno-implicit-module-maps"
				// CHECK-NEXT: ],
				// CHECK-NEXT: "file-deps": [
				// CHECK-NEXT: "[[PREFIX]]/tu.c"
				// CHECK-NEXT: ],
				// CHECK-NEXT: "input-file": "[[PREFIX]]/tu.c"
				// CHECK-NEXT: }
				// CHECK-NEXT: ]
				// CHECK-NEXT: }

This is an archive of the discontinued LLVM Phabricator instance.

[clang][deps] Ensure reported context hash is strictClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 381213

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp

clang/test/ClangScanDeps/Inputs/modules-context-hash/a/dep.h

clang/test/ClangScanDeps/Inputs/modules-context-hash/b/dep.h

clang/test/ClangScanDeps/Inputs/modules-context-hash/cdb.json.template

clang/test/ClangScanDeps/Inputs/modules-context-hash/mod.h

clang/test/ClangScanDeps/Inputs/modules-context-hash/module.modulemap

clang/test/ClangScanDeps/Inputs/modules-context-hash/tu.c

clang/test/ClangScanDeps/modules-context-hash.c

[clang][deps] Ensure reported context hash is strict
ClosedPublic