This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Tooling/DependencyScanning/
-
clang/
-
Tooling/
-
DependencyScanning/
1/2
DependencyScanningTool.h
-
ModuleDepCollector.h
-
lib/Tooling/DependencyScanning/
-
Tooling/
-
DependencyScanning/
-
DependencyScanningTool.cpp
1/2
ModuleDepCollector.cpp
-
test/ClangScanDeps/
-
ClangScanDeps/
-
Inputs/
-
preserved-args/
-
cdb.json.template
-
mod.h
-
module.modulemap
-
tu.c
-
removed-args/
-
cdb.json.template
-
generate-modules-path-args.c
-
preserved-args.c
-
removed-args.c
-
tools/clang-scan-deps/
-
clang-scan-deps/
2/4
ClangScanDeps.cpp

Differential D129389

[clang][deps] Override dependency and serialized diag files for modules
ClosedPublic

Authored by benlangmuir on Jul 8 2022, 12:20 PM.

Download Raw Diff

Details

Reviewers

jansvoboda11
Bigcheese

Commits

rG6626f6fec3d3: [clang][deps] Override dependency and serialized diag files for modules

Summary

When building modules, override secondary outputs (dependency file, dependency targets, serialized diagnostic file) in addition to the pcm file path. This avoids inheriting per-TU command-line options that cause non-determinism in the results (non-deterministic command-line for the module build, non-determinism in which TU's .diag and .d files will contain the module outputs). In clang-scan-deps we infer whether to generate dependency or serialized diagnostic files based on an original command-line. In a real build system this should be modeled explicitly.

Diff Detail

Event Timeline

benlangmuir created this revision.Jul 8 2022, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 12:20 PM

benlangmuir requested review of this revision.Jul 8 2022, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 12:20 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B174430: Diff 443319.Jul 8 2022, 1:51 PM

Updates:

Made lookup of module outputs fallible. Not currently used by clang-scan-deps, but since the expectation is for a build system to provide these settings account for possibility of errors.
Attempt to fix windows path issue in test

Harbormaster completed remote builds in B174473: Diff 443379.Jul 8 2022, 4:50 PM

This looks pretty nice. My only concern is the ad-hoc command line parsing.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
55	I'm curious whether you have encountered situations where being able to return an error from `LookupModuleOutputs` is useful.
clang/test/ClangScanDeps/preserved-args.c
0	I think we can delete this test at this point, since we check both interesting cases in `generate-modules-path-args.c` and `removed-args.c`. If I remember correctly, this test was originally checking that the scanner does not change `-MT` and friends before the scan and doesn't leak that into the resulting command lines.
clang/tools/clang-scan-deps/ClangScanDeps.cpp
297	Is my understanding correct that this lambda of type `const ModuleOutputOptions &(ModuleID)` gets implicitly converted to `llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>`? If so, I think it would be clearer to take the `ModuleID` by const ref here also and wrap the return type with `Expected`, to match the... expected `function_ref` type. WDYT?
398	I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways. Could we move this check somewhere where we have a properly constructed `CompilerInvocation`? I think we could do something like this in `ModuleDeps::getCanonicalCommandLine`: if (!CI.getDiagnosticOpts().DiagnosticSerializationFile.empty()) CI.getDiagnosticOpts().DiagnosticSerializationFile = LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile);

benlangmuir marked an inline comment as done.Jul 11 2022, 2:03 PM

benlangmuir added inline comments.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
55	I ended up backing this change out: it was motivated by a downstream libclang API change that I have now re-evaluated based on your other feedback to use a per-output callback instead of a single callback.
clang/tools/clang-scan-deps/ClangScanDeps.cpp
297	This was unintentional, I just missed these couple of places when I changed the API from `ModuleID` to `const ModuleID &`. Will fix, thanks!
398	Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these outputs into the module command-lines (since the set of callbacks would differ), but I suspect in practice that doesn't matter since you're unlikely to mix compilations that have and don't have serialized diagnostics. To be 100% sound, it will require adding the existence of the outputs to the module context hash (not the actual path, just whether there was a diag and/or d file at all). I will do the context hash change later if you're okay with it - there's nowhere to feed the extra info into `getModuleHash` right now, but I was already planning to change the hashing which will make it easier to do. If you think it's critical we could add a parameter to `getModuleHash` temporarily to handle it. I liked your idea to make the callback per-output as well.

Updates per review

Switched to a per-output callback
Removed preserved-args.c test
Removed error handling that I no longer have a real use for
Only request .d and .diag paths if they were enabled in the original TU

Harbormaster completed remote builds in B174744: Diff 443755.Jul 11 2022, 3:46 PM

LGTM! Thanks.

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
341–342	This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the hash with the `.d` and `.dia` paths already filled in. That would ensure the command line we end up reporting to the build system really does have the context hash associated with the module. (We'd need to include every field set in `getCanonicalCommandLine()` too.) But for the path lookup, we already need some kind of (currently partial) context hash.

This revision is now accepted and ready to land.Jul 12 2022, 7:03 AM

Closed by commit rG6626f6fec3d3: [clang][deps] Override dependency and serialized diag files for modules (authored by benlangmuir). · Explain WhyJul 12 2022, 8:20 AM

This revision was automatically updated to reflect the committed changes.

benlangmuir added a commit: rG6626f6fec3d3: [clang][deps] Override dependency and serialized diag files for modules.

benlangmuir added inline comments.Jul 12 2022, 8:34 AM

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
341–342	The other things added in getCanonicalCommandLine are currently: module map path - I'm planning to include this in my changes to the context hash since it is significant exactly how the path is spelled. This is already part of the implicit module build's notion of identity. pcm paths - these are sound as long as we always get the same paths returned in the callback during a build (across separate builds it would be fine to change them as long as you're going to rebuild anything whose path changed, and anything downstream of that).

jansvoboda11 mentioned this in D131420: [clang][deps] Always generate module paths.Aug 8 2022, 11:15 AM

jansvoboda11 mentioned this in rG71e32d5cf005: [clang][deps] Always generate module paths.Aug 10 2022, 11:59 AM

Revision Contents

Path

Size

clang/

include/

clang/

Tooling/

DependencyScanning/

DependencyScanningTool.h

12 lines

ModuleDepCollector.h

31 lines

lib/

Tooling/

DependencyScanning/

DependencyScanningTool.cpp

9 lines

ModuleDepCollector.cpp

37 lines

test/

ClangScanDeps/

Inputs/

preserved-args/

removed-args/

2 lines

generate-modules-path-args.c

52 lines

preserved-args.c

removed-args.c

6 lines

tools/

clang-scan-deps/

ClangScanDeps.cpp

30 lines

Diff 443755

clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	struct FullDependencies {
/// determined that the differences are benign for this compilation.		/// determined that the differences are benign for this compilation.
std::vector<ModuleID> ClangModuleDeps;		std::vector<ModuleID> ClangModuleDeps;

/// The original command line of the TU (excluding the compiler executable).		/// The original command line of the TU (excluding the compiler executable).
std::vector<std::string> OriginalCommandLine;		std::vector<std::string> OriginalCommandLine;

/// Get the full command line.		/// Get the full command line.
///		///
/// \param LookupPCMPath This function is called to fill in "-fmodule-file="		/// \param LookupModuleOutput This function is called to fill in
/// arguments and the "-o" argument. It needs to return		/// "-fmodule-file=", "-o" and other output
/// a path for where the PCM for the given module is to		/// arguments for dependencies.
/// be located.		std::vector<std::string> getCommandLine(
std::vector<std::string>		llvm::function_ref<std::string(const ModuleID &, ModuleOutputKind)>
getCommandLine(std::function<StringRef(ModuleID)> LookupPCMPath) const;		LookupOutput) const;
		jansvoboda11Unsubmitted Not Done Reply Inline Actions I'm curious whether you have encountered situations where being able to return an error from `LookupModuleOutputs` is useful. jansvoboda11: I'm curious whether you have encountered situations where being able to return an error from…
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions I ended up backing this change out: it was motivated by a downstream libclang API change that I have now re-evaluated based on your other feedback to use a per-output callback instead of a single callback. benlangmuir: I ended up backing this change out: it was motivated by a downstream libclang API change that I…

/// Get the full command line, excluding -fmodule-file=" arguments.		/// Get the full command line, excluding -fmodule-file=" arguments.
std::vector<std::string> getCommandLineWithoutModulePaths() const;		std::vector<std::string> getCommandLineWithoutModulePaths() const;
};		};

struct FullDependenciesResult {		struct FullDependenciesResult {
FullDependencies FullDeps;		FullDependencies FullDeps;
std::vector<ModuleDeps> DiscoveredModules;		std::vector<ModuleDeps> DiscoveredModules;
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
};		};

struct ModuleIDHasher {		struct ModuleIDHasher {
std::size_t operator()(const ModuleID &MID) const {		std::size_t operator()(const ModuleID &MID) const {
return llvm::hash_combine(MID.ModuleName, MID.ContextHash);		return llvm::hash_combine(MID.ModuleName, MID.ContextHash);
}		}
};		};

		/// An output from a module compilation, such as the path of the module file.
		enum class ModuleOutputKind {
		/// The module file (.pcm). Required.
		ModuleFile,
		/// The path of the dependency file (.d), if any.
		DependencyFile,
		/// The null-separated list of names to use as the targets in the dependency
		/// file, if any.
		DependencyTargets,
		/// The path of the serialized diagnostic file (.dia), if any.
		DiagnosticSerializationFile,
		};

struct ModuleDeps {		struct ModuleDeps {
/// The identifier of the module.		/// The identifier of the module.
ModuleID ID;		ModuleID ID;

/// Whether this is a "system" module.		/// Whether this is a "system" module.
bool IsSystem;		bool IsSystem;

/// The path to the modulemap file which defines this module.		/// The path to the modulemap file which defines this module.
Show All 23 Lines	struct ModuleDeps {
/// This may include modules with a different context hash when it can be		/// This may include modules with a different context hash when it can be
/// determined that the differences are benign for this compilation.		/// determined that the differences are benign for this compilation.
std::vector<ModuleID> ClangModuleDeps;		std::vector<ModuleID> ClangModuleDeps;

// Used to track which modules that were discovered were directly imported by		// Used to track which modules that were discovered were directly imported by
// the primary TU.		// the primary TU.
bool ImportedByMainFile = false;		bool ImportedByMainFile = false;

		/// Whether the TU had a dependency file. The path in \c BuildInvocation is
		/// cleared to avoid leaking the specific path from the TU into the module.
		bool HadDependencyFile = false;

		/// Whether the TU had serialized diagnostics. The path in \c BuildInvocation
		/// is cleared to avoid leaking the specific path from the TU into the module.
		bool HadSerializedDiagnostics = false;

/// Compiler invocation that can be used to build this module (without paths).		/// Compiler invocation that can be used to build this module (without paths).
CompilerInvocation BuildInvocation;		CompilerInvocation BuildInvocation;

/// Gets the canonical command line suitable for passing to clang.		/// Gets the canonical command line suitable for passing to clang.
///		///
/// \param LookupPCMPath This function is called to fill in "-fmodule-file="		/// \param LookupModuleOutput This function is called to fill in
/// arguments and the "-o" argument. It needs to return		/// "-fmodule-file=", "-o" and other output
/// a path for where the PCM for the given module is to		/// arguments.
/// be located.
std::vector<std::string> getCanonicalCommandLine(		std::vector<std::string> getCanonicalCommandLine(
std::function<StringRef(ModuleID)> LookupPCMPath) const;		llvm::function_ref<std::string(const ModuleID &, ModuleOutputKind)>
		LookupModuleOutput) const;

/// Gets the canonical command line suitable for passing to clang, excluding		/// Gets the canonical command line suitable for passing to clang, excluding
/// "-fmodule-file=" and "-o" arguments.		/// "-fmodule-file=" and "-o" arguments.
std::vector<std::string> getCanonicalCommandLineWithoutModulePaths() const;		std::vector<std::string> getCanonicalCommandLineWithoutModulePaths() const;
};		};

class ModuleDepCollector;		class ModuleDepCollector;

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

clang/lib/Tooling/DependencyScanning/DependencyScanningTool.cpp

	//===- DependencyScanningTool.cpp - clang-scan-deps service ---------------===//			//===- DependencyScanningTool.cpp - clang-scan-deps service ---------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "clang/Tooling/DependencyScanning/DependencyScanningTool.h"			#include "clang/Tooling/DependencyScanning/DependencyScanningTool.h"
	#include "clang/Frontend/Utils.h"			#include "clang/Frontend/Utils.h"

	namespace clang {			namespace clang {
	namespace tooling {			namespace tooling {
	namespace dependencies {			namespace dependencies {

	std::vector<std::string> FullDependencies::getCommandLine(			std::vector<std::string> FullDependencies::getCommandLine(
	std::function<StringRef(ModuleID)> LookupPCMPath) const {			llvm::function_ref<std::string(const ModuleID &, ModuleOutputKind)>
				LookupModuleOutput) const {
	std::vector<std::string> Ret = getCommandLineWithoutModulePaths();			std::vector<std::string> Ret = getCommandLineWithoutModulePaths();

	for (ModuleID MID : ClangModuleDeps)			for (ModuleID MID : ClangModuleDeps) {
	Ret.push_back(("-fmodule-file=" + LookupPCMPath(MID)).str());			auto PCM = LookupModuleOutput(MID, ModuleOutputKind::ModuleFile);
				Ret.push_back("-fmodule-file=" + PCM);
				}

	return Ret;			return Ret;
	}			}

	std::vector<std::string>			std::vector<std::string>
	FullDependencies::getCommandLineWithoutModulePaths() const {			FullDependencies::getCommandLineWithoutModulePaths() const {
	std::vector<std::string> Args = OriginalCommandLine;			std::vector<std::string> Args = OriginalCommandLine;

	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	CompilerInvocation ModuleDepCollector::makeInvocationForModuleBuildWithoutPaths(

// Remove options incompatible with explicit module build or are likely to		// Remove options incompatible with explicit module build or are likely to
// differ between identical modules discovered from different translation		// differ between identical modules discovered from different translation
// units.		// units.
CI.getFrontendOpts().Inputs.clear();		CI.getFrontendOpts().Inputs.clear();
CI.getFrontendOpts().OutputFile.clear();		CI.getFrontendOpts().OutputFile.clear();
CI.getCodeGenOpts().MainFileName.clear();		CI.getCodeGenOpts().MainFileName.clear();
CI.getCodeGenOpts().DwarfDebugFlags.clear();		CI.getCodeGenOpts().DwarfDebugFlags.clear();
		CI.getDiagnosticOpts().DiagnosticSerializationFile.clear();
		CI.getDependencyOutputOpts().OutputFile.clear();
		CI.getDependencyOutputOpts().Targets.clear();

CI.getFrontendOpts().ProgramAction = frontend::GenerateModule;		CI.getFrontendOpts().ProgramAction = frontend::GenerateModule;
CI.getLangOpts()->ModuleName = Deps.ID.ModuleName;		CI.getLangOpts()->ModuleName = Deps.ID.ModuleName;
CI.getFrontendOpts().IsSystemModule = Deps.IsSystem;		CI.getFrontendOpts().IsSystemModule = Deps.IsSystem;

// Disable implicit modules and canonicalize options that are only used by		// Disable implicit modules and canonicalize options that are only used by
// implicit modules.		// implicit modules.
CI.getLangOpts()->ImplicitModules = false;		CI.getLangOpts()->ImplicitModules = false;
Show All 35 Lines	serializeCompilerInvocation(const CompilerInvocation &CI) {
// Synthesize full command line from the CompilerInvocation, including "-cc1".		// Synthesize full command line from the CompilerInvocation, including "-cc1".
SmallVector<const char *, 32> Args{"-cc1"};		SmallVector<const char *, 32> Args{"-cc1"};
CI.generateCC1CommandLine(Args, SA);		CI.generateCC1CommandLine(Args, SA);

// Convert arguments to the return type.		// Convert arguments to the return type.
return std::vector<std::string>{Args.begin(), Args.end()};		return std::vector<std::string>{Args.begin(), Args.end()};
}		}

		static std::vector<std::string> splitString(std::string S, char Separator) {
		SmallVector<StringRef> Segments;
		StringRef(S).split(Segments, Separator);
		std::vector<std::string> Result;
		Result.reserve(Segments.size());
		for (StringRef Segment : Segments)
		Result.push_back(Segment.str());
		return Result;
		}

std::vector<std::string> ModuleDeps::getCanonicalCommandLine(		std::vector<std::string> ModuleDeps::getCanonicalCommandLine(
std::function<StringRef(ModuleID)> LookupPCMPath) const {		llvm::function_ref<std::string(const ModuleID &, ModuleOutputKind)>
		LookupModuleOutput) const {
CompilerInvocation CI(BuildInvocation);		CompilerInvocation CI(BuildInvocation);
FrontendOptions &FrontendOpts = CI.getFrontendOpts();		FrontendOptions &FrontendOpts = CI.getFrontendOpts();

InputKind ModuleMapInputKind(FrontendOpts.DashX.getLanguage(),		InputKind ModuleMapInputKind(FrontendOpts.DashX.getLanguage(),
InputKind::Format::ModuleMap);		InputKind::Format::ModuleMap);
FrontendOpts.Inputs.emplace_back(ClangModuleMapFile, ModuleMapInputKind);		FrontendOpts.Inputs.emplace_back(ClangModuleMapFile, ModuleMapInputKind);
FrontendOpts.OutputFile = std::string(LookupPCMPath(ID));		FrontendOpts.OutputFile =
		LookupModuleOutput(ID, ModuleOutputKind::ModuleFile);
		if (HadSerializedDiagnostics)
		CI.getDiagnosticOpts().DiagnosticSerializationFile =
		LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile);
		if (HadDependencyFile) {
		CI.getDependencyOutputOpts().OutputFile =
		LookupModuleOutput(ID, ModuleOutputKind::DependencyFile);
		CI.getDependencyOutputOpts().Targets = splitString(
		LookupModuleOutput(ID, ModuleOutputKind::DependencyTargets), '\0');
		}

for (ModuleID MID : ClangModuleDeps)		for (ModuleID MID : ClangModuleDeps)
FrontendOpts.ModuleFiles.emplace_back(LookupPCMPath(MID));		FrontendOpts.ModuleFiles.push_back(
		LookupModuleOutput(MID, ModuleOutputKind::ModuleFile));

return serializeCompilerInvocation(CI);		return serializeCompilerInvocation(CI);
}		}

std::vector<std::string>		std::vector<std::string>
ModuleDeps::getCanonicalCommandLineWithoutModulePaths() const {		ModuleDeps::getCanonicalCommandLineWithoutModulePaths() const {
return serializeCompilerInvocation(BuildInvocation);		return serializeCompilerInvocation(BuildInvocation);
}		}
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	ModuleID ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
addAllSubmodulePrebuiltDeps(M, MD, SeenModules);		addAllSubmodulePrebuiltDeps(M, MD, SeenModules);

MD.BuildInvocation = MDC.makeInvocationForModuleBuildWithoutPaths(		MD.BuildInvocation = MDC.makeInvocationForModuleBuildWithoutPaths(
MD, [&](CompilerInvocation &BuildInvocation) {		MD, [&](CompilerInvocation &BuildInvocation) {
if (MDC.OptimizeArgs)		if (MDC.OptimizeArgs)
optimizeHeaderSearchOpts(BuildInvocation.getHeaderSearchOpts(),		optimizeHeaderSearchOpts(BuildInvocation.getHeaderSearchOpts(),
MDC.ScanInstance.getASTReader(), MF);		MDC.ScanInstance.getASTReader(), MF);
});		});
		MD.HadSerializedDiagnostics = !MDC.OriginalInvocation.getDiagnosticOpts()
		.DiagnosticSerializationFile.empty();
		MD.HadDependencyFile =
		!MDC.OriginalInvocation.getDependencyOutputOpts().OutputFile.empty();
		// FIXME: HadSerializedDiagnostics and HadDependencyFile should be included in
		// the context hash since it can affect the command-line.
		jansvoboda11Unsubmitted Not Done Reply Inline Actions This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the hash with the `.d` and `.dia` paths already filled in. That would ensure the command line we end up reporting to the build system really does have the context hash associated with the module. (We'd need to include every field set in `getCanonicalCommandLine()` too.) But for the path lookup, we already need some kind of (currently partial) context hash. jansvoboda11: This is a bit unfortunate but I don't see a better alternative. Ideally, we would compute the…
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions The other things added in getCanonicalCommandLine are currently: module map path - I'm planning to include this in my changes to the context hash since it is significant exactly how the path is spelled. This is already part of the implicit module build's notion of identity. pcm paths - these are sound as long as we always get the same paths returned in the callback during a build (across separate builds it would be fine to change them as long as you're going to rebuild anything whose path changed, and anything downstream of that). benlangmuir: The other things added in getCanonicalCommandLine are currently: * module map path - I'm…
MD.ID.ContextHash = MD.BuildInvocation.getModuleHash();		MD.ID.ContextHash = MD.BuildInvocation.getModuleHash();

llvm::DenseSet<const Module *> AddedModules;		llvm::DenseSet<const Module *> AddedModules;
addAllSubmoduleDeps(M, MD, AddedModules);		addAllSubmoduleDeps(M, MD, AddedModules);

return MD.ID;		return MD.ID;
}		}

▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

clang/test/ClangScanDeps/Inputs/preserved-args/cdb.json.template

This file was deleted.

	[
	{
	"directory": "DIR",
	"command": "clang -MD -MT my_target -serialize-diagnostics DIR/tu.dia -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-cache-path=DIR/cache -fmodule-file=Foo=DIR/foo.pcm -o DIR/tu.o",
	"file": "DIR/tu.c"
	}
	]

clang/test/ClangScanDeps/Inputs/preserved-args/mod.h

This file was deleted.

// mod.h

clang/test/ClangScanDeps/Inputs/preserved-args/module.modulemap

This file was deleted.

	module Mod {
	header "mod.h"
	}

clang/test/ClangScanDeps/Inputs/preserved-args/tu.c

This file was deleted.

#include "mod.h"

clang/test/ClangScanDeps/Inputs/removed-args/cdb.json.template

	[			[
	{			{
	"directory": "DIR",			"directory": "DIR",
	"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-validate-once-per-build-session -fbuild-session-file=DIR/build-session -fmodules-prune-interval=123 -fmodules-prune-after=123 -fmodules-cache-path=DIR/cache -include DIR/header.h -grecord-command-line -o DIR/tu.o",			"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-validate-once-per-build-session -fbuild-session-file=DIR/build-session -fmodules-prune-interval=123 -fmodules-prune-after=123 -fmodules-cache-path=DIR/cache -include DIR/header.h -grecord-command-line -o DIR/tu.o -serialize-diagnostics DIR/tu.diag -MT tu -MD -MF DIR/tu.d",
	"file": "DIR/tu.c"			"file": "DIR/tu.c"
	}			}
	]			]

clang/test/ClangScanDeps/generate-modules-path-args.c

This file was added.

				// RUN: rm -rf %t
				// RUN: split-file %s %t
				// RUN: sed "s\|DIR\|%/t\|g" %t/cdb.json.template > %t/cdb.json
				// RUN: sed "s\|DIR\|%/t\|g" %t/cdb_without.json.template > %t/cdb_without.json
				// RUN: clang-scan-deps -compilation-database %t/cdb.json \
				// RUN: -format experimental-full -generate-modules-path-args > %t/deps.json
				// RUN: cat %t/deps.json \| sed 's:\\\\\?:/:g' \| FileCheck -DPREFIX=%/t %s
				// RUN: clang-scan-deps -compilation-database %t/cdb_without.json \
				// RUN: -format experimental-full -generate-modules-path-args > %t/deps_without.json
				// RUN: cat %t/deps_without.json \| sed 's:\\\\\?:/:g' \| FileCheck -DPREFIX=%/t -check-prefix=WITHOUT %s

				// CHECK: {
				// CHECK-NEXT: "modules": [
				// CHECK-NEXT: {
				// CHECK: "command-line": [
				// CHECK-NEXT: "-cc1"
				// CHECK: "-serialize-diagnostic-file"
				// CHECK-NEXT: "[[PREFIX]]{{.}}Mod{{.}}.diag"
				// CHECK: "-dependency-file"
				// CHECK-NEXT: "[[PREFIX]]{{.}}Mod{{.}}.d"
				// CHECK: ],

				// WITHOUT: {
				// WITHOUT-NEXT: "modules": [
				// WITHOUT-NEXT: {
				// WITHOUT: "command-line": [
				// WITHOUT-NEXT: "-cc1"
				// WITHOUT-NOT: "-serialize-diagnostic-file"
				// WITHOUT-NOT: "-dependency-file"
				// WITHOUT: ],

				//--- cdb.json.template
				[{
				"directory": "DIR",
				"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-cache-path=DIR/cache -serialize-diagnostics DIR/tu.diag -MD -MT tu -MF DIR/tu.d",
				"file": "DIR/tu.c"
				}]

				//--- cdb_without.json.template
				[{
				"directory": "DIR",
				"command": "clang -fsyntax-only DIR/tu.c -fmodules -fimplicit-module-maps -fmodules-cache-path=DIR/cache",
				"file": "DIR/tu.c"
				}]

				//--- module.modulemap
				module Mod { header "Mod.h" }

				//--- Mod.h

				//--- tu.c
				#include "Mod.h"

clang/test/ClangScanDeps/preserved-args.c

This file was deleted.

	// RUN: rm -rf %t && mkdir %t
	// RUN: cp -r %S/Inputs/preserved-args/* %t
	// RUN: sed -e "s\|DIR\|%/t\|g" %t/cdb.json.template > %t/cdb.json

	// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/result.json
	// RUN: cat %t/result.json \| sed 's:\\\\\?:/:g' \| FileCheck %s -DPREFIX=%/t

	// CHECK: {
	// CHECK-NEXT: "modules": [
	// CHECK-NEXT: {
	// CHECK: "command-line": [
	// CHECK-NEXT: "-cc1"
	// CHECK: "-serialize-diagnostic-file"
	// CHECK-NEXT: "[[PREFIX]]/tu.dia"
	// CHECK: "-fmodule-file=Foo=[[PREFIX]]/foo.pcm"
	// CHECK: "-MT"
	// CHECK-NEXT: "my_target"
	// CHECK: "-dependency-file"
	// CHECK-NEXT: "[[PREFIX]]/tu.d"
	// CHECK: ],
	// CHECK: "name": "Mod"
	// CHECK-NEXT: }
	// CHECK-NEXT: ]
	// CHECK: }

clang/test/ClangScanDeps/removed-args.c

	Show All 23 Lines
	// CHECK-NOT: "-dwarf-debug-flags"			// CHECK-NOT: "-dwarf-debug-flags"
	// CHECK-NOT: "-main-file-name"			// CHECK-NOT: "-main-file-name"
	// CHECK-NOT: "-include"			// CHECK-NOT: "-include"
	// CHECK-NOT: "-fmodules-cache-path=			// CHECK-NOT: "-fmodules-cache-path=
	// CHECK-NOT: "-fmodules-validate-once-per-build-session"			// CHECK-NOT: "-fmodules-validate-once-per-build-session"
	// CHECK-NOT: "-fbuild-session-timestamp=			// CHECK-NOT: "-fbuild-session-timestamp=
	// CHECK-NOT: "-fmodules-prune-interval=			// CHECK-NOT: "-fmodules-prune-interval=
	// CHECK-NOT: "-fmodules-prune-after=			// CHECK-NOT: "-fmodules-prune-after=
				// CHECK-NOT: "-dependency-file"
				// CHECK-NOT: "-MT"
				// CHECK-NOT: "-serialize-diagnostic-file"
	// CHECK: ],			// CHECK: ],
	// CHECK-NEXT: "context-hash": "[[HASH_MOD_HEADER:.*]]",			// CHECK-NEXT: "context-hash": "[[HASH_MOD_HEADER:.*]]",
	// CHECK-NEXT: "file-deps": [			// CHECK-NEXT: "file-deps": [
	// CHECK-NEXT: "[[PREFIX]]/mod_header.h",			// CHECK-NEXT: "[[PREFIX]]/mod_header.h",
	// CHECK-NEXT: "[[PREFIX]]/module.modulemap"			// CHECK-NEXT: "[[PREFIX]]/module.modulemap"
	// CHECK-NEXT: ],			// CHECK-NEXT: ],
	// CHECK-NEXT: "name": "ModHeader"			// CHECK-NEXT: "name": "ModHeader"
	// CHECK-NEXT: },			// CHECK-NEXT: },
	// CHECK-NEXT: {			// CHECK-NEXT: {
	// CHECK-NEXT: "clang-module-deps": [],			// CHECK-NEXT: "clang-module-deps": [],
	// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/module.modulemap",			// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/module.modulemap",
	// CHECK-NEXT: "command-line": [			// CHECK-NEXT: "command-line": [
	// CHECK-NEXT: "-cc1"			// CHECK-NEXT: "-cc1"
	// CHECK-NOT: "-dwarf-debug-flags"			// CHECK-NOT: "-dwarf-debug-flags"
	// CHECK-NOT: "-main-file-name"			// CHECK-NOT: "-main-file-name"
	// CHECK-NOT: "-include"			// CHECK-NOT: "-include"
	// CHECK-NOT: "-fmodules-cache-path=			// CHECK-NOT: "-fmodules-cache-path=
	// CHECK-NOT: "-fmodules-validate-once-per-build-session"			// CHECK-NOT: "-fmodules-validate-once-per-build-session"
	// CHECK-NOT: "-fbuild-session-timestamp=			// CHECK-NOT: "-fbuild-session-timestamp=
	// CHECK-NOT: "-fmodules-prune-interval=			// CHECK-NOT: "-fmodules-prune-interval=
	// CHECK-NOT: "-fmodules-prune-after=			// CHECK-NOT: "-fmodules-prune-after=
				// CHECK-NOT: "-dependency-file"
				// CHECK-NOT: "-MT"
				// CHECK-NOT: "-serialize-diagnostic-file"
	// CHECK: ],			// CHECK: ],
	// CHECK-NEXT: "context-hash": "[[HASH_MOD_TU:.*]]",			// CHECK-NEXT: "context-hash": "[[HASH_MOD_TU:.*]]",
	// CHECK-NEXT: "file-deps": [			// CHECK-NEXT: "file-deps": [
	// CHECK-NEXT: "[[PREFIX]]/mod_tu.h",			// CHECK-NEXT: "[[PREFIX]]/mod_tu.h",
	// CHECK-NEXT: "[[PREFIX]]/module.modulemap"			// CHECK-NEXT: "[[PREFIX]]/module.modulemap"
	// CHECK-NEXT: ],			// CHECK-NEXT: ],
	// CHECK-NEXT: "name": "ModTU"			// CHECK-NEXT: "name": "ModTU"
	// CHECK-NEXT: }			// CHECK-NEXT: }
	Show All 25 Lines

clang/tools/clang-scan-deps/ClangScanDeps.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	for (const ModuleDeps &MD : FDR.DiscoveredModules) {
auto I = Modules.find({MD.ID, 0});		auto I = Modules.find({MD.ID, 0});
if (I != Modules.end()) {		if (I != Modules.end()) {
I->first.InputIndex = std::min(I->first.InputIndex, InputIndex);		I->first.InputIndex = std::min(I->first.InputIndex, InputIndex);
continue;		continue;
}		}
Modules.insert(I, {{MD.ID, InputIndex}, std::move(MD)});		Modules.insert(I, {{MD.ID, InputIndex}, std::move(MD)});
}		}

ID.CommandLine = GenerateModulesPathArgs		ID.CommandLine =
? FD.getCommandLine(		GenerateModulesPathArgs
[&](ModuleID MID) { return lookupPCMPath(MID); })		? FD.getCommandLine([&](const ModuleID &MID, ModuleOutputKind MOK) {
		return lookupModuleOutput(MID, MOK);
		})
: FD.getCommandLineWithoutModulePaths();		: FD.getCommandLineWithoutModulePaths();

Inputs.push_back(std::move(ID));		Inputs.push_back(std::move(ID));
		jansvoboda11Unsubmitted Not Done Reply Inline Actions Is my understanding correct that this lambda of type `const ModuleOutputOptions &(ModuleID)` gets implicitly converted to `llvm::function_ref<Expected<const ModuleOutputOptions &>(const ModuleID &)>`? If so, I think it would be clearer to take the `ModuleID` by const ref here also and wrap the return type with `Expected`, to match the... expected `function_ref` type. WDYT? jansvoboda11: Is my understanding correct that this lambda of type `const ModuleOutputOptions &(ModuleID)`…
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions This was unintentional, I just missed these couple of places when I changed the API from `ModuleID` to `const ModuleID &`. Will fix, thanks! benlangmuir: This was unintentional, I just missed these couple of places when I changed the API from…
}		}

void printFullOutput(raw_ostream &OS) {		void printFullOutput(raw_ostream &OS) {
// Sort the modules by name to get a deterministic order.		// Sort the modules by name to get a deterministic order.
std::vector<IndexedModuleID> ModuleIDs;		std::vector<IndexedModuleID> ModuleIDs;
for (auto &&M : Modules)		for (auto &&M : Modules)
ModuleIDs.push_back(M.first);		ModuleIDs.push_back(M.first);
llvm::sort(ModuleIDs,		llvm::sort(ModuleIDs,
Show All 15 Lines	for (auto &&ModID : ModuleIDs) {
{"name", MD.ID.ModuleName},		{"name", MD.ID.ModuleName},
{"context-hash", MD.ID.ContextHash},		{"context-hash", MD.ID.ContextHash},
{"file-deps", toJSONSorted(MD.FileDeps)},		{"file-deps", toJSONSorted(MD.FileDeps)},
{"clang-module-deps", toJSONSorted(MD.ClangModuleDeps)},		{"clang-module-deps", toJSONSorted(MD.ClangModuleDeps)},
{"clang-modulemap-file", MD.ClangModuleMapFile},		{"clang-modulemap-file", MD.ClangModuleMapFile},
{"command-line",		{"command-line",
GenerateModulesPathArgs		GenerateModulesPathArgs
? MD.getCanonicalCommandLine(		? MD.getCanonicalCommandLine(
[&](ModuleID MID) { return lookupPCMPath(MID); })		[&](const ModuleID &MID, ModuleOutputKind MOK) {
		return lookupModuleOutput(MID, MOK);
		})
: MD.getCanonicalCommandLineWithoutModulePaths()},		: MD.getCanonicalCommandLineWithoutModulePaths()},
};		};
OutModules.push_back(std::move(O));		OutModules.push_back(std::move(O));
}		}

Array TUs;		Array TUs;
for (auto &&I : Inputs) {		for (auto &&I : Inputs) {
Object O{		Object O{
Show All 10 Lines	Object Output{
{"modules", std::move(OutModules)},		{"modules", std::move(OutModules)},
{"translation-units", std::move(TUs)},		{"translation-units", std::move(TUs)},
};		};

OS << llvm::formatv("{0:2}\n", Value(std::move(Output)));		OS << llvm::formatv("{0:2}\n", Value(std::move(Output)));
}		}

private:		private:
StringRef lookupPCMPath(ModuleID MID) {		std::string lookupModuleOutput(const ModuleID &MID, ModuleOutputKind MOK) {
		// Cache the PCM path, since it will be queried repeatedly for each module.
		// The other outputs are only queried once during getCanonicalCommandLine.
auto PCMPath = PCMPaths.insert({MID, ""});		auto PCMPath = PCMPaths.insert({MID, ""});
if (PCMPath.second)		if (PCMPath.second)
PCMPath.first->second = constructPCMPath(MID);		PCMPath.first->second = constructPCMPath(MID);
		switch (MOK) {
		case ModuleOutputKind::ModuleFile:
return PCMPath.first->second;		return PCMPath.first->second;
		case ModuleOutputKind::DependencyFile:
		return PCMPath.first->second + ".d";
		case ModuleOutputKind::DependencyTargets:
		return ""; // Will get the default target name.
		case ModuleOutputKind::DiagnosticSerializationFile:
		return PCMPath.first->second + ".diag";
		}
}		}

/// Construct a path for the explicitly built PCM.		/// Construct a path for the explicitly built PCM.
std::string constructPCMPath(ModuleID MID) const {		std::string constructPCMPath(ModuleID MID) const {
auto MDIt = Modules.find(IndexedModuleID{MID, 0});		auto MDIt = Modules.find(IndexedModuleID{MID, 0});
assert(MDIt != Modules.end());		assert(MDIt != Modules.end());
const ModuleDeps &MD = MDIt->second;		const ModuleDeps &MD = MDIt->second;

StringRef Filename = llvm::sys::path::filename(MD.ImplicitModulePCMPath);		StringRef Filename = llvm::sys::path::filename(MD.ImplicitModulePCMPath);
StringRef ModuleCachePath = llvm::sys::path::parent_path(		StringRef ModuleCachePath = llvm::sys::path::parent_path(
llvm::sys::path::parent_path(MD.ImplicitModulePCMPath));		llvm::sys::path::parent_path(MD.ImplicitModulePCMPath));

SmallString<256> ExplicitPCMPath(!ModuleFilesDir.empty() ? ModuleFilesDir		SmallString<256> ExplicitPCMPath(!ModuleFilesDir.empty() ? ModuleFilesDir
: ModuleCachePath);		: ModuleCachePath);
llvm::sys::path::append(ExplicitPCMPath, MD.ID.ContextHash, Filename);		llvm::sys::path::append(ExplicitPCMPath, MD.ID.ContextHash, Filename);
return std::string(ExplicitPCMPath);		return std::string(ExplicitPCMPath);
}		}

struct IndexedModuleID {		struct IndexedModuleID {
ModuleID ID;		ModuleID ID;
mutable size_t InputIndex;		mutable size_t InputIndex;

bool operator==(const IndexedModuleID &Other) const {		bool operator==(const IndexedModuleID &Other) const {
return ID.ModuleName == Other.ID.ModuleName &&		return ID.ModuleName == Other.ID.ModuleName &&
ID.ContextHash == Other.ID.ContextHash;		ID.ContextHash == Other.ID.ContextHash;
		jansvoboda11Unsubmitted Not Done Reply Inline Actions I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways. Could we move this check somewhere where we have a properly constructed `CompilerInvocation`? I think we could do something like this in `ModuleDeps::getCanonicalCommandLine`: if (!CI.getDiagnosticOpts().DiagnosticSerializationFile.empty()) CI.getDiagnosticOpts().DiagnosticSerializationFile = LookupModuleOutput(ID, ModuleOutputKind::DiagnosticSerializationFile); jansvoboda11: I think we should be avoiding ad-hoc command line parsing, it can be incorrect in many ways.
		benlangmuirAuthorUnsubmitted Done Reply Inline Actions Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these outputs into the module command-lines (since the set of callbacks would differ), but I suspect in practice that doesn't matter since you're unlikely to mix compilations that have and don't have serialized diagnostics. To be 100% sound, it will require adding the existence of the outputs to the module context hash (not the actual path, just whether there was a diag and/or d file at all). I will do the context hash change later if you're okay with it - there's nowhere to feed the extra info into `getModuleHash` right now, but I was already planning to change the hashing which will make it easier to do. If you think it's critical we could add a parameter to `getModuleHash` temporarily to handle it. I liked your idea to make the callback per-output as well. benlangmuir: Yeah, we can do that. I originally avoided this due to it "leaking" whether the TU used these…
}		}
};		};

struct IndexedModuleIDHasher {		struct IndexedModuleIDHasher {
std::size_t operator()(const IndexedModuleID &IMID) const {		std::size_t operator()(const IndexedModuleID &IMID) const {
using llvm::hash_combine;		using llvm::hash_combine;

return hash_combine(IMID.ID.ModuleName, IMID.ID.ContextHash);		return hash_combine(IMID.ID.ModuleName, IMID.ID.ContextHash);
▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines