This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Tooling/DependencyScanning/
-
clang/
-
Tooling/
-
DependencyScanning/
1/2
DependencyScanningTool.h
-
ModuleDepCollector.h
-
lib/Tooling/DependencyScanning/
-
Tooling/
-
DependencyScanning/
-
DependencyScanningTool.cpp
1/2
ModuleDepCollector.cpp
-
test/ClangScanDeps/
-
ClangScanDeps/
-
Inputs/removed-args/
-
removed-args/
-
cdb.json.template
-
generate-modules-path-args.c
1/1
preserved-args.c
-
removed-args.c
-
tools/clang-scan-deps/
-
clang-scan-deps/
2/4
ClangScanDeps.cpp

Differential D129389

[clang][deps] Override dependency and serialized diag files for modules
ClosedPublic

Authored by benlangmuir on Jul 8 2022, 12:20 PM.

Download Raw Diff

Details

Reviewers

jansvoboda11
Bigcheese

Commits

rG6626f6fec3d3: [clang][deps] Override dependency and serialized diag files for modules

Summary

When building modules, override secondary outputs (dependency file, dependency targets, serialized diagnostic file) in addition to the pcm file path. This avoids inheriting per-TU command-line options that cause non-determinism in the results (non-deterministic command-line for the module build, non-determinism in which TU's .diag and .d files will contain the module outputs). In clang-scan-deps we infer whether to generate dependency or serialized diagnostic files based on an original command-line. In a real build system this should be modeled explicitly.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

benlangmuir created this revision.Jul 8 2022, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 12:20 PM

benlangmuir requested review of this revision.Jul 8 2022, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 12:20 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B174430: Diff 443319.Jul 8 2022, 1:51 PM

Updates:

Made lookup of module outputs fallible. Not currently used by clang-scan-deps, but since the expectation is for a build system to provide these settings account for possibility of errors.
Attempt to fix windows path issue in test

Harbormaster completed remote builds in B174473: Diff 443379.Jul 8 2022, 4:50 PM

This looks pretty nice. My only concern is the ad-hoc command line parsing.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
55	I'm curious whether you have encountered situations where being able to return an error from `LookupModuleOutputs` is useful.
clang/test/ClangScanDeps/preserved-args.c
1	I think we can delete this test at this point, since we check both interesting cases in `generate-modules-path-args.c` and `removed-args.c`. If I remember correctly, this test was originally checking that the scanner does not change `-MT` and friends before the scan and doesn't leak that into the resulting command lines.
clang/tools/clang-scan-deps/ClangScanDeps.cpp
297	Is my understanding correct that this lambda of type `const ModuleOutputOptions &(ModuleID)` gets implicitly converted to `llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>`? If so, I think it would be clearer to take the `ModuleID` by const ref here also and wrap the return type with `Expected`, to match the... expected `function_ref` type. WDYT?
397	I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways. Could we move this check somewhere where we have a properly constructed `CompilerInvocation`? I think we could do something like this in `ModuleDeps::getCanonicalCommandLine`: if (!CI.getDiagnosticOpts().DiagnosticSerializationFile.empty()) CI.getDiagnosticOpts().DiagnosticSerializationFile = LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile);

benlangmuir marked an inline comment as done.Jul 11 2022, 2:03 PM

benlangmuir added inline comments.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
55	I ended up backing this change out: it was motivated by a downstream libclang API change that I have now re-evaluated based on your other feedback to use a per-output callback instead of a single callback.
clang/tools/clang-scan-deps/ClangScanDeps.cpp
297	This was unintentional, I just missed these couple of places when I changed the API from `ModuleID` to `const ModuleID &`. Will fix, thanks!
397	Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these outputs into the module command-lines (since the set of callbacks would differ), but I suspect in practice that doesn't matter since you're unlikely to mix compilations that have and don't have serialized diagnostics. To be 100% sound, it will require adding the existence of the outputs to the module context hash (not the actual path, just whether there was a diag and/or d file at all). I will do the context hash change later if you're okay with it - there's nowhere to feed the extra info into `getModuleHash` right now, but I was already planning to change the hashing which will make it easier to do. If you think it's critical we could add a parameter to `getModuleHash` temporarily to handle it. I liked your idea to make the callback per-output as well.

Updates per review

Switched to a per-output callback
Removed preserved-args.c test
Removed error handling that I no longer have a real use for
Only request .d and .diag paths if they were enabled in the original TU

Harbormaster completed remote builds in B174744: Diff 443755.Jul 11 2022, 3:46 PM

LGTM! Thanks.

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
334–335	This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the hash with the `.d` and `.dia` paths already filled in. That would ensure the command line we end up reporting to the build system really does have the context hash associated with the module. (We'd need to include every field set in `getCanonicalCommandLine()` too.) But for the path lookup, we already need some kind of (currently partial) context hash.

This revision is now accepted and ready to land.Jul 12 2022, 7:03 AM

Closed by commit rG6626f6fec3d3: [clang][deps] Override dependency and serialized diag files for modules (authored by benlangmuir). · Explain WhyJul 12 2022, 8:20 AM

This revision was automatically updated to reflect the committed changes.

benlangmuir added a commit: rG6626f6fec3d3: [clang][deps] Override dependency and serialized diag files for modules.

benlangmuir added inline comments.Jul 12 2022, 8:34 AM

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
334–335	The other things added in getCanonicalCommandLine are currently: module map path - I'm planning to include this in my changes to the context hash since it is significant exactly how the path is spelled. This is already part of the implicit module build's notion of identity. pcm paths - these are sound as long as we always get the same paths returned in the callback during a build (across separate builds it would be fine to change them as long as you're going to rebuild anything whose path changed, and anything downstream of that).

jansvoboda11 mentioned this in D131420: [clang][deps] Always generate module paths.Aug 8 2022, 11:15 AM

jansvoboda11 mentioned this in rG71e32d5cf005: [clang][deps] Always generate module paths.Aug 10 2022, 11:59 AM

Revision Contents

Path

Size

clang/

include/

clang/

Tooling/

DependencyScanning/

DependencyScanningTool.h

13 lines

ModuleDepCollector.h

26 lines

lib/

Tooling/

DependencyScanning/

DependencyScanningTool.cpp

13 lines

ModuleDepCollector.cpp

30 lines

test/

ClangScanDeps/

Inputs/

removed-args/

cdb.json.template

2 lines

generate-modules-path-args.c

52 lines

preserved-args.c

6 lines

removed-args.c

6 lines

tools/

clang-scan-deps/

ClangScanDeps.cpp

55 lines

Diff 443379

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	struct FullDependencies {
/// determined that the differences are benign for this compilation.		/// determined that the differences are benign for this compilation.
std::vector<ModuleID> ClangModuleDeps;		std::vector<ModuleID> ClangModuleDeps;

/// The original command line of the TU (excluding the compiler executable).		/// The original command line of the TU (excluding the compiler executable).
std::vector<std::string> OriginalCommandLine;		std::vector<std::string> OriginalCommandLine;

/// Get the full command line.		/// Get the full command line.
///		///
/// \param LookupPCMPath This function is called to fill in "-fmodule-file="		/// \param LookupModuleOutputs This function is called to fill in
/// arguments and the "-o" argument. It needs to return		/// "-fmodule-file=", "-o" and other output
/// a path for where the PCM for the given module is to		/// arguments for dependencies.
/// be located.		Expected<std::vector<std::string>>
std::vector<std::string>		getCommandLine(llvm::function_ref<
getCommandLine(std::function<StringRef(ModuleID)> LookupPCMPath) const;		Expected<const ModuleOutputOptions &>(const ModuleID &)>
		jansvoboda11Unsubmitted Not Done Reply Inline Actions I'm curious whether you have encountered situations where being able to return an error from `LookupModuleOutputs` is useful. jansvoboda11: I'm curious whether you have encountered situations where being able to return an error from…
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions I ended up backing this change out: it was motivated by a downstream libclang API change that I have now re-evaluated based on your other feedback to use a per-output callback instead of a single callback. benlangmuir: I ended up backing this change out: it was motivated by a downstream libclang API change that I…
		LookupModuleOutputs) const;

/// Get the full command line, excluding -fmodule-file=" arguments.		/// Get the full command line, excluding -fmodule-file=" arguments.
std::vector<std::string> getCommandLineWithoutModulePaths() const;		std::vector<std::string> getCommandLineWithoutModulePaths() const;
};		};

struct FullDependenciesResult {		struct FullDependenciesResult {
FullDependencies FullDeps;		FullDependencies FullDeps;
std::vector<ModuleDeps> DiscoveredModules;		std::vector<ModuleDeps> DiscoveredModules;
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
};		};

struct ModuleIDHasher {		struct ModuleIDHasher {
std::size_t operator()(const ModuleID &MID) const {		std::size_t operator()(const ModuleID &MID) const {
return llvm::hash_combine(MID.ModuleName, MID.ContextHash);		return llvm::hash_combine(MID.ModuleName, MID.ContextHash);
}		}
};		};

		/// Options to use to set or override output paths when building a module.
		struct ModuleOutputOptions {
		/// The path of the module file (.pcm). Required.
		std::string ModuleFile;
		/// The path of the dependency file (.d), if any.
		Optional<std::string> DependencyFile;
		/// A list of names to use as the targets in the dependency file; if provided,
		/// this list must contain at least one entry.
		Optional<std::vector<std::string>> DependencyTargets;
		/// The path of the serialized diagnostic file (.dia), if any.
		Optional<std::string> DiagnosticSerializationFile;
		};

struct ModuleDeps {		struct ModuleDeps {
/// The identifier of the module.		/// The identifier of the module.
ModuleID ID;		ModuleID ID;

/// Whether this is a "system" module.		/// Whether this is a "system" module.
bool IsSystem;		bool IsSystem;

/// The path to the modulemap file which defines this module.		/// The path to the modulemap file which defines this module.
Show All 28 Lines	struct ModuleDeps {
// the primary TU.		// the primary TU.
bool ImportedByMainFile = false;		bool ImportedByMainFile = false;

/// Compiler invocation that can be used to build this module (without paths).		/// Compiler invocation that can be used to build this module (without paths).
CompilerInvocation BuildInvocation;		CompilerInvocation BuildInvocation;

/// Gets the canonical command line suitable for passing to clang.		/// Gets the canonical command line suitable for passing to clang.
///		///
/// \param LookupPCMPath This function is called to fill in "-fmodule-file="		/// \param LookupModuleOutputs This function is called to fill in
/// arguments and the "-o" argument. It needs to return		/// "-fmodule-file=", "-o" and other output
/// a path for where the PCM for the given module is to		/// arguments.
/// be located.		Expected<std::vector<std::string>> getCanonicalCommandLine(
std::vector<std::string> getCanonicalCommandLine(		llvm::function_ref<
std::function<StringRef(ModuleID)> LookupPCMPath) const;		Expected<const ModuleOutputOptions &>(const ModuleID &)>
		LookupModuleOutputs) const;

/// Gets the canonical command line suitable for passing to clang, excluding		/// Gets the canonical command line suitable for passing to clang, excluding
/// "-fmodule-file=" and "-o" arguments.		/// "-fmodule-file=" and "-o" arguments.
std::vector<std::string> getCanonicalCommandLineWithoutModulePaths() const;		std::vector<std::string> getCanonicalCommandLineWithoutModulePaths() const;
};		};

class ModuleDepCollector;		class ModuleDepCollector;

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

clang/lib/Tooling/DependencyScanning/DependencyScanningTool.cpp

	//===- DependencyScanningTool.cpp - clang-scan-deps service ---------------===//			//===- DependencyScanningTool.cpp - clang-scan-deps service ---------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "clang/Tooling/DependencyScanning/DependencyScanningTool.h"			#include "clang/Tooling/DependencyScanning/DependencyScanningTool.h"
	#include "clang/Frontend/Utils.h"			#include "clang/Frontend/Utils.h"

	namespace clang {			namespace clang {
	namespace tooling {			namespace tooling {
	namespace dependencies {			namespace dependencies {

	std::vector<std::string> FullDependencies::getCommandLine(			Expected<std::vector<std::string>> FullDependencies::getCommandLine(
	std::function<StringRef(ModuleID)> LookupPCMPath) const {			llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>
				LookupModuleOutputs) const {
	std::vector<std::string> Ret = getCommandLineWithoutModulePaths();			std::vector<std::string> Ret = getCommandLineWithoutModulePaths();

	for (ModuleID MID : ClangModuleDeps)			for (ModuleID MID : ClangModuleDeps) {
	Ret.push_back(("-fmodule-file=" + LookupPCMPath(MID)).str());			auto MO = LookupModuleOutputs(MID);
				if (!MO)
				return MO.takeError();
				Ret.push_back("-fmodule-file=" + MO->ModuleFile);
				}

	return Ret;			return Ret;
	}			}

	std::vector<std::string>			std::vector<std::string>
	FullDependencies::getCommandLineWithoutModulePaths() const {			FullDependencies::getCommandLineWithoutModulePaths() const {
	std::vector<std::string> Args = OriginalCommandLine;			std::vector<std::string> Args = OriginalCommandLine;

	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	CompilerInvocation ModuleDepCollector::makeInvocationForModuleBuildWithoutPaths(

// Remove options incompatible with explicit module build or are likely to		// Remove options incompatible with explicit module build or are likely to
// differ between identical modules discovered from different translation		// differ between identical modules discovered from different translation
// units.		// units.
CI.getFrontendOpts().Inputs.clear();		CI.getFrontendOpts().Inputs.clear();
CI.getFrontendOpts().OutputFile.clear();		CI.getFrontendOpts().OutputFile.clear();
CI.getCodeGenOpts().MainFileName.clear();		CI.getCodeGenOpts().MainFileName.clear();
CI.getCodeGenOpts().DwarfDebugFlags.clear();		CI.getCodeGenOpts().DwarfDebugFlags.clear();
		CI.getDiagnosticOpts().DiagnosticSerializationFile.clear();
		CI.getDependencyOutputOpts().OutputFile.clear();
		CI.getDependencyOutputOpts().Targets.clear();

CI.getFrontendOpts().ProgramAction = frontend::GenerateModule;		CI.getFrontendOpts().ProgramAction = frontend::GenerateModule;
CI.getLangOpts()->ModuleName = Deps.ID.ModuleName;		CI.getLangOpts()->ModuleName = Deps.ID.ModuleName;
CI.getFrontendOpts().IsSystemModule = Deps.IsSystem;		CI.getFrontendOpts().IsSystemModule = Deps.IsSystem;

// Disable implicit modules and canonicalize options that are only used by		// Disable implicit modules and canonicalize options that are only used by
// implicit modules.		// implicit modules.
CI.getLangOpts()->ImplicitModules = false;		CI.getLangOpts()->ImplicitModules = false;
Show All 35 Lines	serializeCompilerInvocation(const CompilerInvocation &CI) {
// Synthesize full command line from the CompilerInvocation, including "-cc1".		// Synthesize full command line from the CompilerInvocation, including "-cc1".
SmallVector<const char *, 32> Args{"-cc1"};		SmallVector<const char *, 32> Args{"-cc1"};
CI.generateCC1CommandLine(Args, SA);		CI.generateCC1CommandLine(Args, SA);

// Convert arguments to the return type.		// Convert arguments to the return type.
return std::vector<std::string>{Args.begin(), Args.end()};		return std::vector<std::string>{Args.begin(), Args.end()};
}		}

std::vector<std::string> ModuleDeps::getCanonicalCommandLine(		Expected<std::vector<std::string>> ModuleDeps::getCanonicalCommandLine(
std::function<StringRef(ModuleID)> LookupPCMPath) const {		llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>
		LookupModuleOutputs) const {
CompilerInvocation CI(BuildInvocation);		CompilerInvocation CI(BuildInvocation);
FrontendOptions &FrontendOpts = CI.getFrontendOpts();		FrontendOptions &FrontendOpts = CI.getFrontendOpts();
		auto MO = LookupModuleOutputs(ID);
		if (!MO)
		return MO.takeError();

InputKind ModuleMapInputKind(FrontendOpts.DashX.getLanguage(),		InputKind ModuleMapInputKind(FrontendOpts.DashX.getLanguage(),
InputKind::Format::ModuleMap);		InputKind::Format::ModuleMap);
FrontendOpts.Inputs.emplace_back(ClangModuleMapFile, ModuleMapInputKind);		FrontendOpts.Inputs.emplace_back(ClangModuleMapFile, ModuleMapInputKind);
FrontendOpts.OutputFile = std::string(LookupPCMPath(ID));		FrontendOpts.OutputFile = MO->ModuleFile;
		if (MO->DiagnosticSerializationFile)
for (ModuleID MID : ClangModuleDeps)		CI.getDiagnosticOpts().DiagnosticSerializationFile =
FrontendOpts.ModuleFiles.emplace_back(LookupPCMPath(MID));		*MO->DiagnosticSerializationFile;
		if (MO->DependencyFile)
		CI.getDependencyOutputOpts().OutputFile = *MO->DependencyFile;
		if (MO->DependencyTargets)
		CI.getDependencyOutputOpts().Targets = *MO->DependencyTargets;

		for (ModuleID MID : ClangModuleDeps) {
		auto MO = LookupModuleOutputs(MID);
		if (!MO)
		return MO.takeError();
		FrontendOpts.ModuleFiles.emplace_back(MO->ModuleFile);
		}

return serializeCompilerInvocation(CI);		return serializeCompilerInvocation(CI);
}		}

std::vector<std::string>		std::vector<std::string>
ModuleDeps::getCanonicalCommandLineWithoutModulePaths() const {		ModuleDeps::getCanonicalCommandLineWithoutModulePaths() const {
return serializeCompilerInvocation(BuildInvocation);		return serializeCompilerInvocation(BuildInvocation);
}		}
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	MD.BuildInvocation = MDC.makeInvocationForModuleBuildWithoutPaths(
if (MDC.OptimizeArgs)		if (MDC.OptimizeArgs)
optimizeHeaderSearchOpts(BuildInvocation.getHeaderSearchOpts(),		optimizeHeaderSearchOpts(BuildInvocation.getHeaderSearchOpts(),
MDC.ScanInstance.getASTReader(), MF);		MDC.ScanInstance.getASTReader(), MF);
});		});
MD.ID.ContextHash = MD.BuildInvocation.getModuleHash();		MD.ID.ContextHash = MD.BuildInvocation.getModuleHash();

llvm::DenseSet<const Module *> AddedModules;		llvm::DenseSet<const Module *> AddedModules;
addAllSubmoduleDeps(M, MD, AddedModules);		addAllSubmoduleDeps(M, MD, AddedModules);

return MD.ID;		return MD.ID;
		jansvoboda11Unsubmitted Not Done Reply Inline Actions This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the hash with the `.d` and `.dia` paths already filled in. That would ensure the command line we end up reporting to the build system really does have the context hash associated with the module. (We'd need to include every field set in `getCanonicalCommandLine()` too.) But for the path lookup, we already need some kind of (currently partial) context hash. jansvoboda11: This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the…
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions The other things added in getCanonicalCommandLine are currently: module map path - I'm planning to include this in my changes to the context hash since it is significant exactly how the path is spelled. This is already part of the implicit module build's notion of identity. pcm paths - these are sound as long as we always get the same paths returned in the callback during a build (across separate builds it would be fine to change them as long as you're going to rebuild anything whose path changed, and anything downstream of that). benlangmuir: The other things added in getCanonicalCommandLine are currently: * module map path - I'm…
}		}

static void forEachSubmoduleSorted(const Module *M,		static void forEachSubmoduleSorted(const Module *M,
llvm::function_ref<void(const Module *)> F) {		llvm::function_ref<void(const Module *)> F) {
// Submodule order depends on order of header includes for inferred submodules		// Submodule order depends on order of header includes for inferred submodules
// we don't care about the exact order, so sort so that it's consistent across		// we don't care about the exact order, so sort so that it's consistent across
// TUs to improve sharing.		// TUs to improve sharing.
SmallVector<const Module *> Submodules(M->submodule_begin(),		SmallVector<const Module *> Submodules(M->submodule_begin(),
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

clang/test/ClangScanDeps/Inputs/removed-args/cdb.json.template

	[			[
	{			{
	"directory": "DIR",			"directory": "DIR",
	"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-validate-once-per-build-session -fbuild-session-file=DIR/build-session -fmodules-prune-interval=123 -fmodules-prune-after=123 -fmodules-cache-path=DIR/cache -include DIR/header.h -grecord-command-line -o DIR/tu.o",			"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-validate-once-per-build-session -fbuild-session-file=DIR/build-session -fmodules-prune-interval=123 -fmodules-prune-after=123 -fmodules-cache-path=DIR/cache -include DIR/header.h -grecord-command-line -o DIR/tu.o -serialize-diagnostics DIR/tu.diag -MT tu -MD -MF DIR/tu.d",
	"file": "DIR/tu.c"			"file": "DIR/tu.c"
	}			}
	]			]

clang/test/ClangScanDeps/generate-modules-path-args.c

This file was added.

				// RUN: rm -rf %t
				// RUN: split-file %s %t
				// RUN: sed "s\|DIR\|%/t\|g" %t/cdb.json.template > %t/cdb.json
				// RUN: sed "s\|DIR\|%/t\|g" %t/cdb_without.json.template > %t/cdb_without.json
				// RUN: clang-scan-deps -compilation-database %t/cdb.json \
				// RUN: -format experimental-full -generate-modules-path-args > %t/deps.json
				// RUN: cat %t/deps.json \| sed 's:\\\\\?:/:g' \| FileCheck -DPREFIX=%/t %s
				// RUN: clang-scan-deps -compilation-database %t/cdb_without.json \
				// RUN: -format experimental-full -generate-modules-path-args > %t/deps_without.json
				// RUN: cat %t/deps_without.json \| sed 's:\\\\\?:/:g' \| FileCheck -DPREFIX=%/t -check-prefix=WITHOUT %s

				// CHECK: {
				// CHECK-NEXT: "modules": [
				// CHECK-NEXT: {
				// CHECK: "command-line": [
				// CHECK-NEXT: "-cc1"
				// CHECK: "-serialize-diagnostic-file"
				// CHECK-NEXT: "[[PREFIX]]{{.}}Mod{{.}}.diag"
				// CHECK: "-dependency-file"
				// CHECK-NEXT: "[[PREFIX]]{{.}}Mod{{.}}.d"
				// CHECK: ],

				// WITHOUT: {
				// WITHOUT-NEXT: "modules": [
				// WITHOUT-NEXT: {
				// WITHOUT: "command-line": [
				// WITHOUT-NEXT: "-cc1"
				// WITHOUT-NOT: "-serialize-diagnostic-file"
				// WITHOUT-NOT: "-dependency-file"
				// WITHOUT: ],

				//--- cdb.json.template
				[{
				"directory": "DIR",
				"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-cache-path=DIR/cache -serialize-diagnostics DIR/tu.diag -MD -MT tu -MF DIR/tu.d",
				"file": "DIR/tu.c"
				}]

				//--- cdb_without.json.template
				[{
				"directory": "DIR",
				"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-cache-path=DIR/cache",
				"file": "DIR/tu.c"
				}]

				//--- module.modulemap
				module Mod { header "Mod.h" }

				//--- Mod.h

				//--- tu.c
				#include "Mod.h"

clang/test/ClangScanDeps/preserved-args.c

	// RUN: rm -rf %t && mkdir %t			// RUN: rm -rf %t && mkdir %t
				jansvoboda11Unsubmitted Done Reply Inline Actions I think we can delete this test at this point, since we check both interesting cases in `generate-modules-path-args.c` and `removed-args.c`. If I remember correctly, this test was originally checking that the scanner does not change `-MT` and friends before the scan and doesn't leak that into the resulting command lines. jansvoboda11: I think we can delete this test at this point, since we check both interesting cases in…
	// RUN: cp -r %S/Inputs/preserved-args/* %t			// RUN: cp -r %S/Inputs/preserved-args/* %t
	// RUN: sed -e "s\|DIR\|%/t\|g" %t/cdb.json.template > %t/cdb.json			// RUN: sed -e "s\|DIR\|%/t\|g" %t/cdb.json.template > %t/cdb.json

	// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/result.json			// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/result.json
	// RUN: cat %t/result.json \| sed 's:\\\\\?:/:g' \| FileCheck %s -DPREFIX=%/t			// RUN: cat %t/result.json \| sed 's:\\\\\?:/:g' \| FileCheck %s -DPREFIX=%/t

	// CHECK: {			// CHECK: {
	// CHECK-NEXT: "modules": [			// CHECK-NEXT: "modules": [
	// CHECK-NEXT: {			// CHECK-NEXT: {
	// CHECK: "command-line": [			// CHECK: "command-line": [
	// CHECK-NEXT: "-cc1"			// CHECK-NEXT: "-cc1"
	// CHECK: "-serialize-diagnostic-file"
	// CHECK-NEXT: "[[PREFIX]]/tu.dia"
	// CHECK: "-fmodule-file=Foo=[[PREFIX]]/foo.pcm"			// CHECK: "-fmodule-file=Foo=[[PREFIX]]/foo.pcm"
	// CHECK: "-MT"
	// CHECK-NEXT: "my_target"
	// CHECK: "-dependency-file"
	// CHECK-NEXT: "[[PREFIX]]/tu.d"
	// CHECK: ],			// CHECK: ],
	// CHECK: "name": "Mod"			// CHECK: "name": "Mod"
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: ]			// CHECK-NEXT: ]
	// CHECK: }			// CHECK: }

clang/test/ClangScanDeps/removed-args.c

	Show All 23 Lines
	// CHECK-NOT: "-dwarf-debug-flags"			// CHECK-NOT: "-dwarf-debug-flags"
	// CHECK-NOT: "-main-file-name"			// CHECK-NOT: "-main-file-name"
	// CHECK-NOT: "-include"			// CHECK-NOT: "-include"
	// CHECK-NOT: "-fmodules-cache-path=			// CHECK-NOT: "-fmodules-cache-path=
	// CHECK-NOT: "-fmodules-validate-once-per-build-session"			// CHECK-NOT: "-fmodules-validate-once-per-build-session"
	// CHECK-NOT: "-fbuild-session-timestamp=			// CHECK-NOT: "-fbuild-session-timestamp=
	// CHECK-NOT: "-fmodules-prune-interval=			// CHECK-NOT: "-fmodules-prune-interval=
	// CHECK-NOT: "-fmodules-prune-after=			// CHECK-NOT: "-fmodules-prune-after=
				// CHECK-NOT: "-dependency-file"
				// CHECK-NOT: "-MT"
				// CHECK-NOT: "-serialize-diagnostic-file"
	// CHECK: ],			// CHECK: ],
	// CHECK-NEXT: "context-hash": "[[HASH_MOD_HEADER:.*]]",			// CHECK-NEXT: "context-hash": "[[HASH_MOD_HEADER:.*]]",
	// CHECK-NEXT: "file-deps": [			// CHECK-NEXT: "file-deps": [
	// CHECK-NEXT: "[[PREFIX]]/mod_header.h",			// CHECK-NEXT: "[[PREFIX]]/mod_header.h",
	// CHECK-NEXT: "[[PREFIX]]/module.modulemap"			// CHECK-NEXT: "[[PREFIX]]/module.modulemap"
	// CHECK-NEXT: ],			// CHECK-NEXT: ],
	// CHECK-NEXT: "name": "ModHeader"			// CHECK-NEXT: "name": "ModHeader"
	// CHECK-NEXT: },			// CHECK-NEXT: },
	// CHECK-NEXT: {			// CHECK-NEXT: {
	// CHECK-NEXT: "clang-module-deps": [],			// CHECK-NEXT: "clang-module-deps": [],
	// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/module.modulemap",			// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/module.modulemap",
	// CHECK-NEXT: "command-line": [			// CHECK-NEXT: "command-line": [
	// CHECK-NEXT: "-cc1"			// CHECK-NEXT: "-cc1"
	// CHECK-NOT: "-dwarf-debug-flags"			// CHECK-NOT: "-dwarf-debug-flags"
	// CHECK-NOT: "-main-file-name"			// CHECK-NOT: "-main-file-name"
	// CHECK-NOT: "-include"			// CHECK-NOT: "-include"
	// CHECK-NOT: "-fmodules-cache-path=			// CHECK-NOT: "-fmodules-cache-path=
	// CHECK-NOT: "-fmodules-validate-once-per-build-session"			// CHECK-NOT: "-fmodules-validate-once-per-build-session"
	// CHECK-NOT: "-fbuild-session-timestamp=			// CHECK-NOT: "-fbuild-session-timestamp=
	// CHECK-NOT: "-fmodules-prune-interval=			// CHECK-NOT: "-fmodules-prune-interval=
	// CHECK-NOT: "-fmodules-prune-after=			// CHECK-NOT: "-fmodules-prune-after=
				// CHECK-NOT: "-dependency-file"
				// CHECK-NOT: "-MT"
				// CHECK-NOT: "-serialize-diagnostic-file"
	// CHECK: ],			// CHECK: ],
	// CHECK-NEXT: "context-hash": "[[HASH_MOD_TU:.*]]",			// CHECK-NEXT: "context-hash": "[[HASH_MOD_TU:.*]]",
	// CHECK-NEXT: "file-deps": [			// CHECK-NEXT: "file-deps": [
	// CHECK-NEXT: "[[PREFIX]]/mod_tu.h",			// CHECK-NEXT: "[[PREFIX]]/mod_tu.h",
	// CHECK-NEXT: "[[PREFIX]]/module.modulemap"			// CHECK-NEXT: "[[PREFIX]]/module.modulemap"
	// CHECK-NEXT: ],			// CHECK-NEXT: ],
	// CHECK-NEXT: "name": "ModTU"			// CHECK-NEXT: "name": "ModTU"
	// CHECK-NEXT: }			// CHECK-NEXT: }
	Show All 25 Lines

clang/tools/clang-scan-deps/ClangScanDeps.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	for (const ModuleDeps &MD : FDR.DiscoveredModules) {
auto I = Modules.find({MD.ID, 0});		auto I = Modules.find({MD.ID, 0});
if (I != Modules.end()) {		if (I != Modules.end()) {
I->first.InputIndex = std::min(I->first.InputIndex, InputIndex);		I->first.InputIndex = std::min(I->first.InputIndex, InputIndex);
continue;		continue;
}		}
Modules.insert(I, {{MD.ID, InputIndex}, std::move(MD)});		Modules.insert(I, {{MD.ID, InputIndex}, std::move(MD)});
}		}

ID.CommandLine = GenerateModulesPathArgs		if (Inputs.size() == 0)
? FD.getCommandLine(		inferOutputOptions(FD.OriginalCommandLine);
[&](ModuleID MID) { return lookupPCMPath(MID); })
: FD.getCommandLineWithoutModulePaths();

		ID.CommandLine =
		GenerateModulesPathArgs
		? llvm::cantFail(FD.getCommandLine(
		[&](ModuleID MID) -> const ModuleOutputOptions & {
		jansvoboda11Unsubmitted Not Done Reply Inline Actions Is my understanding correct that this lambda of type `const ModuleOutputOptions &(ModuleID)` gets implicitly converted to `llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>`? If so, I think it would be clearer to take the `ModuleID` by const ref here also and wrap the return type with `Expected`, to match the... expected `function_ref` type. WDYT? jansvoboda11: Is my understanding correct that this lambda of type `const ModuleOutputOptions &(ModuleID)`…
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions This was unintentional, I just missed these couple of places when I changed the API from `ModuleID` to `const ModuleID &`. Will fix, thanks! benlangmuir: This was unintentional, I just missed these couple of places when I changed the API from…
		return lookupModuleOutputs(MID);
		}))
		: FD.getCommandLineWithoutModulePaths();
Inputs.push_back(std::move(ID));		Inputs.push_back(std::move(ID));
}		}

void printFullOutput(raw_ostream &OS) {		void printFullOutput(raw_ostream &OS) {
// Sort the modules by name to get a deterministic order.		// Sort the modules by name to get a deterministic order.
std::vector<IndexedModuleID> ModuleIDs;		std::vector<IndexedModuleID> ModuleIDs;
for (auto &&M : Modules)		for (auto &&M : Modules)
ModuleIDs.push_back(M.first);		ModuleIDs.push_back(M.first);
Show All 15 Lines	for (auto &&ModID : ModuleIDs) {
Object O{		Object O{
{"name", MD.ID.ModuleName},		{"name", MD.ID.ModuleName},
{"context-hash", MD.ID.ContextHash},		{"context-hash", MD.ID.ContextHash},
{"file-deps", toJSONSorted(MD.FileDeps)},		{"file-deps", toJSONSorted(MD.FileDeps)},
{"clang-module-deps", toJSONSorted(MD.ClangModuleDeps)},		{"clang-module-deps", toJSONSorted(MD.ClangModuleDeps)},
{"clang-modulemap-file", MD.ClangModuleMapFile},		{"clang-modulemap-file", MD.ClangModuleMapFile},
{"command-line",		{"command-line",
GenerateModulesPathArgs		GenerateModulesPathArgs
? MD.getCanonicalCommandLine(		? llvm::cantFail(MD.getCanonicalCommandLine(
[&](ModuleID MID) { return lookupPCMPath(MID); })		[&](ModuleID MID) -> const ModuleOutputOptions & {
		return lookupModuleOutputs(MID);
		}))
: MD.getCanonicalCommandLineWithoutModulePaths()},		: MD.getCanonicalCommandLineWithoutModulePaths()},
};		};
OutModules.push_back(std::move(O));		OutModules.push_back(std::move(O));
}		}

Array TUs;		Array TUs;
for (auto &&I : Inputs) {		for (auto &&I : Inputs) {
Object O{		Object O{
Show All 10 Lines	Object Output{
{"modules", std::move(OutModules)},		{"modules", std::move(OutModules)},
{"translation-units", std::move(TUs)},		{"translation-units", std::move(TUs)},
};		};

OS << llvm::formatv("{0:2}\n", Value(std::move(Output)));		OS << llvm::formatv("{0:2}\n", Value(std::move(Output)));
}		}

private:		private:
StringRef lookupPCMPath(ModuleID MID) {		const ModuleOutputOptions &lookupModuleOutputs(const ModuleID &MID) {
auto PCMPath = PCMPaths.insert({MID, ""});		auto MO = ModuleOutputs.insert({MID, {}});
if (PCMPath.second)		if (MO.second) {
PCMPath.first->second = constructPCMPath(MID);		ModuleOutputOptions &Opts = MO.first->second;
return PCMPath.first->second;		Opts.ModuleFile = constructPCMPath(MID);
		if (DependencyOutputFile)
		Opts.DependencyFile = Opts.ModuleFile + ".d";
		if (SerializeDiags)
		Opts.DiagnosticSerializationFile = Opts.ModuleFile + ".diag";
		}
		return MO.first->second;
}		}

/// Construct a path for the explicitly built PCM.		/// Construct a path for the explicitly built PCM.
std::string constructPCMPath(ModuleID MID) const {		std::string constructPCMPath(ModuleID MID) const {
auto MDIt = Modules.find(IndexedModuleID{MID, 0});		auto MDIt = Modules.find(IndexedModuleID{MID, 0});
assert(MDIt != Modules.end());		assert(MDIt != Modules.end());
const ModuleDeps &MD = MDIt->second;		const ModuleDeps &MD = MDIt->second;

StringRef Filename = llvm::sys::path::filename(MD.ImplicitModulePCMPath);		StringRef Filename = llvm::sys::path::filename(MD.ImplicitModulePCMPath);
StringRef ModuleCachePath = llvm::sys::path::parent_path(		StringRef ModuleCachePath = llvm::sys::path::parent_path(
llvm::sys::path::parent_path(MD.ImplicitModulePCMPath));		llvm::sys::path::parent_path(MD.ImplicitModulePCMPath));

SmallString<256> ExplicitPCMPath(!ModuleFilesDir.empty() ? ModuleFilesDir		SmallString<256> ExplicitPCMPath(!ModuleFilesDir.empty() ? ModuleFilesDir
: ModuleCachePath);		: ModuleCachePath);
llvm::sys::path::append(ExplicitPCMPath, MD.ID.ContextHash, Filename);		llvm::sys::path::append(ExplicitPCMPath, MD.ID.ContextHash, Filename);
return std::string(ExplicitPCMPath);		return std::string(ExplicitPCMPath);
}		}

		/// Infer whether modules should write serialized diagnostic, .d, etc.
		///
		/// A build system should model this directly, but here we infer it from an
		/// original TU command.
		void inferOutputOptions(ArrayRef<std::string> Args) {
		for (StringRef Arg : Args) {
		if (Arg == "-serialize-diagnostics")
		jansvoboda11Unsubmitted Not Done Reply Inline Actions I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways. Could we move this check somewhere where we have a properly constructed `CompilerInvocation`? I think we could do something like this in `ModuleDeps::getCanonicalCommandLine`: if (!CI.getDiagnosticOpts().DiagnosticSerializationFile.empty()) CI.getDiagnosticOpts().DiagnosticSerializationFile = LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile); jansvoboda11: I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways.
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these outputs into the module command-lines (since the set of callbacks would differ), but I suspect in practice that doesn't matter since you're unlikely to mix compilations that have and don't have serialized diagnostics. To be 100% sound, it will require adding the existence of the outputs to the module context hash (not the actual path, just whether there was a diag and/or d file at all). I will do the context hash change later if you're okay with it - there's nowhere to feed the extra info into `getModuleHash` right now, but I was already planning to change the hashing which will make it easier to do. If you think it's critical we could add a parameter to `getModuleHash` temporarily to handle it. I liked your idea to make the callback per-output as well. benlangmuir: Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these…
		SerializeDiags = true;
		else if (Arg == "-M" \|\| Arg == "-MM" \|\| Arg == "-MMD" \|\| Arg == "-MD")
		DependencyOutputFile = true;
		}
		}

struct IndexedModuleID {		struct IndexedModuleID {
ModuleID ID;		ModuleID ID;
mutable size_t InputIndex;		mutable size_t InputIndex;

bool operator==(const IndexedModuleID &Other) const {		bool operator==(const IndexedModuleID &Other) const {
return ID.ModuleName == Other.ID.ModuleName &&		return ID.ModuleName == Other.ID.ModuleName &&
ID.ContextHash == Other.ID.ContextHash;		ID.ContextHash == Other.ID.ContextHash;
}		}
Show All 13 Lines	struct InputDeps {
std::vector<std::string> FileDeps;		std::vector<std::string> FileDeps;
std::vector<ModuleID> ModuleDeps;		std::vector<ModuleID> ModuleDeps;
std::vector<std::string> CommandLine;		std::vector<std::string> CommandLine;
};		};

std::mutex Lock;		std::mutex Lock;
std::unordered_map<IndexedModuleID, ModuleDeps, IndexedModuleIDHasher>		std::unordered_map<IndexedModuleID, ModuleDeps, IndexedModuleIDHasher>
Modules;		Modules;
std::unordered_map<ModuleID, std::string, ModuleIDHasher> PCMPaths;		std::unordered_map<ModuleID, ModuleOutputOptions, ModuleIDHasher>
		ModuleOutputs;
std::vector<InputDeps> Inputs;		std::vector<InputDeps> Inputs;
		bool SerializeDiags = false;
		bool DependencyOutputFile = false;
};		};

static bool handleFullDependencyToolResult(		static bool handleFullDependencyToolResult(
const std::string &Input,		const std::string &Input,
llvm::Expected<FullDependenciesResult> &MaybeFullDeps, FullDeps &FD,		llvm::Expected<FullDependenciesResult> &MaybeFullDeps, FullDeps &FD,
size_t InputIndex, SharedStream &OS, SharedStream &Errs) {		size_t InputIndex, SharedStream &OS, SharedStream &Errs) {
if (!MaybeFullDeps) {		if (!MaybeFullDeps) {
llvm::handleAllErrors(		llvm::handleAllErrors(
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines