This is an archive of the discontinued LLVM Phabricator instance.

lib/Tooling/InterpolatingCompilationDatabase.cpp
440 ↗	(On Diff #162694)	We can't look at 'Type' at this point anymore, because it needs parsing of TranserableCommands. Not sure what's the best way to replace it. @sammccall, any ideas?

jfb requested changes to this revision.Aug 27 2018, 9:45 AM

jfb added inline comments.

lib/Tooling/InterpolatingCompilationDatabase.cpp
201 ↗	(On Diff #162694)	It's not clear to me that this entire function is safe to call from multiple threads at the same time. Even if it's safe now, I'm willing to bet it won't always be that way. `getTraits` should therefore use a magic static or `call_once` and avoid the headache entirely.
258 ↗	(On Diff #162694)	The comment should say why accesses need to be atomic. Or better yet, this should only be usable "the right way".

This revision now requires changes to proceed.Aug 27 2018, 9:45 AM

Add a comment
Use std::call_once to compute the lazy value

Harbormaster completed remote builds in B21966: Diff 162699.Aug 27 2018, 10:01 AM

Remove computeTraits, put the body inside a lambda

Harbormaster completed remote builds in B21968: Diff 162701.Aug 27 2018, 10:07 AM

ilya-biryukov added inline comments.Aug 27 2018, 10:07 AM

lib/Tooling/InterpolatingCompilationDatabase.cpp
201 ↗	(On Diff #162694)	Thanks, `call_once` is both simpler and more reliable.
258 ↗	(On Diff #162694)	Clarified how it's used.

Remove #include <atomic>, it is not used anymore

Harbormaster completed remote builds in B21969: Diff 162702.Aug 27 2018, 10:08 AM

jfb added inline comments.Aug 27 2018, 10:26 AM

lib/Tooling/InterpolatingCompilationDatabase.cpp
124 ↗	(On Diff #162702)	This comment about `move` isn't really saying anything. Also, it's valid but unspecified (in the case of STL things). I'd drop it.
128 ↗	(On Diff #162702)	The `once_flag` should just be a static, don't allocate it.

ilya-biryukov added inline comments.Aug 28 2018, 1:14 AM

lib/Tooling/InterpolatingCompilationDatabase.cpp
124 ↗	(On Diff #162702)	We specifically assert that object cannot be called after `move()` (check the unique_ptr that stores our `once_flag`). It's definitely undefined behavior to call any methods, because they will immediately dereference a null pointer (the aforementioned unique_ptr). Happy to drop the comment, though, we do have asserts for that.
128 ↗	(On Diff #162702)	Sorry, I don't seem to follow. We need one `once_flag` per `TransferableCommand`

Thanks for finding this problem, this fix *mostly* looks good (though I think we can probably drop memoization).

I'm a bit uncomfortable about the places where we need the type, because this is the only thing forcing us to parse before we've picked a command to transfer, and the number of commands we need to parse is data-dependent and hard to reason about.

Let me think about this a little - I suspect slightly more invasive changes (change the concept of type, tweak the heuristics, or do a "lightweight parse" to get the type) might make this cleaner and performance more predictable.

lib/Tooling/InterpolatingCompilationDatabase.cpp
133 ↗	(On Diff #162702)	I think you're overthinking things with the memoization here (of course I say this as the person who underthought it!) AIUI, the problem is that eagerly parsing all the compile commands takes 3x as long as reading them, which hurts startup time with big `compile_commands.json`. But I think we can afford to just parse them when `transferTo` is called, without memoization. (Remember we only hit this code path when querying a file not in the CDB, so it should never get called in a tight loop). The benefit of slightly reducing the latency of`getCompileCommand` for unknown files when we happen to pick the same template file for the second time... it's unclear to me, and the code would be easier to follow without it.
159 ↗	(On Diff #162702)	Is this so important to dynamically check? Most types don't.
171 ↗	(On Diff #162702)	Traits is a bit vague, and a bit template-nightmare-y! maybe ParsedCommand?
383 ↗	(On Diff #162702)	I think this is going to force parsing of all candidates that get any points at all, with a flat directory structure this could be quite a lot :-(
383 ↗	(On Diff #162702)	ah, now I see that the memoization also allows us to pretend that this is an eagerly computed value, without considering exactly when it's computed. I'm not sure I like this if we do consider it performance sensitive - it obfuscates exactly which set of commands we parse, it would be nice if we were upfront about this.
440 ↗	(On Diff #162694)	So filtering out this has a couple of effects it's a performance optimization (don't bother indexing filenames for useless files). We don't need this it prevents a TY_INVALID command being chosen for transfer. I'm not sure whether this would occur often enough to be a problem in practice.

sammccall added inline comments.Aug 28 2018, 1:39 AM

lib/Tooling/InterpolatingCompilationDatabase.cpp
124 ↗	(On Diff #162702)	I think the idea is "this comment just reiterates standard C++ semantics". (FWIW I find the asserts hurt readability - it's unusual to try to detect this condition, and TraitsComputed is easily mistaken for a boolean)

In D51314#1215381, @sammccall wrote:

I'm a bit uncomfortable about the places where we need the type, because this is the only thing forcing us to parse before we've picked a command to transfer, and the number of commands we need to parse is data-dependent and hard to reason about.

Let me think about this a little - I suspect slightly more invasive changes (change the concept of type, tweak the heuristics, or do a "lightweight parse" to get the type) might make this cleaner and performance more predictable.

Having studied the code a bit - we use the parsed type to evaluate candidates, but we're comparing against the type extracted from the query filename (we have nothing else!).
So we should be fine just to compare extensions instead here. So either TransferableCommand would have an Extension field or a TypeFromFilename field - but we should be careful not to conflate the type inferred from the filename (fine for *selecting* a command) with the one parsed from the command (needed to *transfer* the command).

Then we have no need to parse for selection, and getCompileCommands() only needs to parse a single command from the underlying CDB, which should yield very predictable/consistent performance.
(It also couples nicely with the idea of only grabbing the full command from the underlying CDB lazily - we'll only do that once, too).

lib/Tooling/InterpolatingCompilationDatabase.cpp
480–481 ↗	(On Diff #162702)	as you pointed out offline, we actually only need the filename for indexing, so we could ask the underlying DB for the filenames and get their commands on demand. (we need to verify that the values returned don't differ from the filenames stored in CompileCommand, e.g. by filename normalization, in a way that matters)

Remove mutexes, recompute every time instead
Delay creation of TransferableCommand to avoid calling getAllCommands() on JSONCompilationDatabase

Herald added a subscriber: mgrang. · View Herald TranscriptAug 28 2018, 3:18 AM

Harbormaster completed remote builds in B21992: Diff 162817.Aug 28 2018, 3:18 AM

ilya-biryukov marked 4 inline comments as done.Aug 28 2018, 3:19 AM

ilya-biryukov added inline comments.

lib/Tooling/InterpolatingCompilationDatabase.cpp
133 ↗	(On Diff #162702)	Totally agree, memoization does not buy us much here.

ilya-biryukov edited the summary of this revision. (Show Details)Aug 28 2018, 3:21 AM

Awesome :-)

lib/Tooling/InterpolatingCompilationDatabase.cpp
220 ↗	(On Diff #162817)	This summary is a bit unclear to me. (too many clauses, maybe too abstract). And the high level heuristics are hidden a bit below the implementation ideas. Maybe Given a filename, FileProximityIndex picks the best matching file from the underlying DB. This is the proxy file whose CompileCommand will be reused. The heuristics incorporate file name, extension, and directory structure. Strategy: ...
220 ↗	(On Diff #162817)	nit: I'd prefer `FileIndex` or `FilenameIndex` here - "proximity" emphasizes directory structure over stem/extension, which are pretty important!
232 ↗	(On Diff #162817)	restore this comment?
338 ↗	(On Diff #162817)	hmm, I would have thought we'd store the values of guessType() when building the index. I guess it doesn't matter, it just seems surprising to see this call here.
338 ↗	(On Diff #162817)	you're calling foldType(guessType(...)) on the query, do you need to fold here too?
417 ↗	(On Diff #162817)	this guessType/foldType call should be folded into Index.chooseProxy now I think - Index explicitly knows that the language it deals with must be derived from the filename.

Update the comments
Rename the new class to FileIndex
Restore an accidentally lost comment
Store file types in a parallel array instead of recomputing on each call
Use foldType(guessType()) when obtaining lang type from filename

Harbormaster completed remote builds in B21997: Diff 162849.Aug 28 2018, 6:56 AM

Reformat the code
Minor spelling fix

Harbormaster completed remote builds in B21998: Diff 162851.Aug 28 2018, 6:59 AM

sammccall accepted this revision.Aug 28 2018, 7:10 AM

sammccall added inline comments.

lib/Tooling/InterpolatingCompilationDatabase.cpp
229 ↗	(On Diff #162851)	(this comment is covered in the summary now)

sammccall mentioned this in D51321: [Tooling] Improve handling of CL-style options.Aug 28 2018, 7:20 AM

Lowercase everything stored in the index.

Harbormaster completed remote builds in B22000: Diff 162855.Aug 28 2018, 7:24 AM

Handle TransferableCommands with TY_INVALID type (never transfer -x flag for those)
Add a test with invalid extensions, seen a crash while experimenting
Update the test wrt to the new behavior.

Harbormaster completed remote builds in B22002: Diff 162861.Aug 28 2018, 7:49 AM

Sort Paths, they are different from OriginalPaths, i.e. lowercased.

Harbormaster completed remote builds in B22004: Diff 162863.Aug 28 2018, 7:52 AM

sammccall accepted this revision.Aug 28 2018, 7:55 AM

sammccall added inline comments.

lib/Tooling/InterpolatingCompilationDatabase.cpp
127 ↗	(On Diff #162861)	(or just "never TY_INVALID" which would fit on prev line :-)
175 ↗	(On Diff #162861)	it's always set here, drop the condition.
408 ↗	(On Diff #162861)	nit: no llvm:: :-)

Cleanups

Harbormaster completed remote builds in B22005: Diff 162868.Aug 28 2018, 8:01 AM

This revision was not accepted when it landed; it landed in state Needs Review.Aug 28 2018, 9:16 AM

Closed by commit rL340838: Parse compile commands lazily in InterpolatingCompilationDatabase (authored by ibiryukov). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 28 2018, 9:16 AM

Revision Contents

Path

Size

cfe/

trunk/

lib/

Tooling/

InterpolatingCompilationDatabase.cpp

118 lines

unittests/

Tooling/

CompilationDatabaseTest.cpp

9 lines

Diff 162889

cfe/trunk/lib/Tooling/InterpolatingCompilationDatabase.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	default:
return types::TY_INVALID;		return types::TY_INVALID;
}		}
}		}

// A CompileCommand that can be applied to another file.		// A CompileCommand that can be applied to another file.
struct TransferableCommand {		struct TransferableCommand {
// Flags that should not apply to all files are stripped from CommandLine.		// Flags that should not apply to all files are stripped from CommandLine.
CompileCommand Cmd;		CompileCommand Cmd;
// Language detected from -x or the filename.		// Language detected from -x or the filename. Never TY_INVALID.
types::ID Type = types::TY_INVALID;		Optional<types::ID> Type;
// Standard specified by -std.		// Standard specified by -std.
LangStandard::Kind Std = LangStandard::lang_unspecified;		LangStandard::Kind Std = LangStandard::lang_unspecified;

TransferableCommand(CompileCommand C)		TransferableCommand(CompileCommand C)
: Cmd(std::move(C)), Type(guessType(Cmd.Filename)) {		: Cmd(std::move(C)), Type(guessType(Cmd.Filename)) {
std::vector<std::string> NewArgs = {Cmd.CommandLine.front()};		std::vector<std::string> NewArgs = {Cmd.CommandLine.front()};
// Parse the old args in order to strip out and record unwanted flags.		// Parse the old args in order to strip out and record unwanted flags.
auto OptTable = clang::driver::createDriverOptTable();		auto OptTable = clang::driver::createDriverOptTable();
Show All 30 Lines	#include "clang/Frontend/LangStandards.def"
llvm::opt::ArgStringList ArgStrs;		llvm::opt::ArgStringList ArgStrs;
Arg->render(ArgList, ArgStrs);		Arg->render(ArgList, ArgStrs);
NewArgs.insert(NewArgs.end(), ArgStrs.begin(), ArgStrs.end());		NewArgs.insert(NewArgs.end(), ArgStrs.begin(), ArgStrs.end());
}		}
Cmd.CommandLine = std::move(NewArgs);		Cmd.CommandLine = std::move(NewArgs);

if (Std != LangStandard::lang_unspecified) // -std take precedence over -x		if (Std != LangStandard::lang_unspecified) // -std take precedence over -x
Type = toType(LangStandard::getLangStandardForKind(Std).getLanguage());		Type = toType(LangStandard::getLangStandardForKind(Std).getLanguage());
Type = foldType(Type);		Type = foldType(*Type);
		// The contract is to store None instead of TY_INVALID.
		if (Type == types::TY_INVALID)
		Type = llvm::None;
}		}

// Produce a CompileCommand for \p filename, based on this one.		// Produce a CompileCommand for \p filename, based on this one.
CompileCommand transferTo(StringRef Filename) const {		CompileCommand transferTo(StringRef Filename) const {
CompileCommand Result = Cmd;		CompileCommand Result = Cmd;
Result.Filename = Filename;		Result.Filename = Filename;
bool TypeCertain;		bool TypeCertain;
auto TargetType = guessType(Filename, &TypeCertain);		auto TargetType = guessType(Filename, &TypeCertain);
// If the filename doesn't determine the language (.h), transfer with -x.		// If the filename doesn't determine the language (.h), transfer with -x.
if (!TypeCertain) {		if (TargetType != types::TY_INVALID && !TypeCertain && Type) {
TargetType = types::onlyPrecompileType(TargetType) // header?		TargetType = types::onlyPrecompileType(TargetType) // header?
? types::lookupHeaderTypeForSourceType(Type)		? types::lookupHeaderTypeForSourceType(*Type)
: Type;		: *Type;
Result.CommandLine.push_back("-x");		Result.CommandLine.push_back("-x");
Result.CommandLine.push_back(types::getTypeName(TargetType));		Result.CommandLine.push_back(types::getTypeName(TargetType));
}		}
// --std flag may only be transferred if the language is the same.		// --std flag may only be transferred if the language is the same.
// We may consider "translating" these, e.g. c++11 -> c11.		// We may consider "translating" these, e.g. c++11 -> c11.
if (Std != LangStandard::lang_unspecified && foldType(TargetType) == Type) {		if (Std != LangStandard::lang_unspecified && foldType(TargetType) == Type) {
Result.CommandLine.push_back(		Result.CommandLine.push_back(
"-std=" +		"-std=" +
Show All 16 Lines	static types::ID toType(InputKind::Language Lang) {
case InputKind::ObjCXX:		case InputKind::ObjCXX:
return types::TY_ObjCXX;		return types::TY_ObjCXX;
default:		default:
return types::TY_INVALID;		return types::TY_INVALID;
}		}
}		}
};		};

// CommandIndex does the real work: given a filename, it produces the best		// Given a filename, FileIndex picks the best matching file from the underlying
// matching TransferableCommand by matching filenames. Basic strategy:		// DB. This is the proxy file whose CompileCommand will be reused. The
		// heuristics incorporate file name, extension, and directory structure.
		// Strategy:
// - Build indexes of each of the substrings we want to look up by.		// - Build indexes of each of the substrings we want to look up by.
// These indexes are just sorted lists of the substrings.		// These indexes are just sorted lists of the substrings.
// - Forward requests to the inner CDB. If it fails, we must pick a proxy.
// - Each criterion corresponds to a range lookup into the index, so we only		// - Each criterion corresponds to a range lookup into the index, so we only
// need O(log N) string comparisons to determine scores.		// need O(log N) string comparisons to determine scores.
// - We then break ties among the candidates with the highest score.		//
class CommandIndex {		// Apart from path proximity signals, also takes file extensions into account
		// when scoring the candidates.
		class FileIndex {
public:		public:
CommandIndex(std::vector<TransferableCommand> AllCommands)		FileIndex(std::vector<std::string> Files)
: Commands(std::move(AllCommands)), Strings(Arena) {		: OriginalPaths(std::move(Files)), Strings(Arena) {
// Sort commands by filename for determinism (index is a tiebreaker later).		// Sort commands by filename for determinism (index is a tiebreaker later).
llvm::sort(		llvm::sort(OriginalPaths.begin(), OriginalPaths.end());
Commands.begin(), Commands.end(),		Paths.reserve(OriginalPaths.size());
[](const TransferableCommand &Left, const TransferableCommand &Right) {		Types.reserve(OriginalPaths.size());
return Left.Cmd.Filename < Right.Cmd.Filename;		Stems.reserve(OriginalPaths.size());
});		for (size_t I = 0; I < OriginalPaths.size(); ++I) {
for (size_t I = 0; I < Commands.size(); ++I) {		StringRef Path = Strings.save(StringRef(OriginalPaths[I]).lower());
StringRef Path =
Strings.save(StringRef(Commands[I].Cmd.Filename).lower());		Paths.emplace_back(Path, I);
Paths.push_back({Path, I});		Types.push_back(foldType(guessType(Path)));
Stems.emplace_back(sys::path::stem(Path), I);		Stems.emplace_back(sys::path::stem(Path), I);
auto Dir = ++sys::path::rbegin(Path), DirEnd = sys::path::rend(Path);		auto Dir = ++sys::path::rbegin(Path), DirEnd = sys::path::rend(Path);
for (int J = 0; J < DirectorySegmentsIndexed && Dir != DirEnd; ++J, ++Dir)		for (int J = 0; J < DirectorySegmentsIndexed && Dir != DirEnd; ++J, ++Dir)
if (Dir->size() > ShortDirectorySegment) // not trivial ones		if (Dir->size() > ShortDirectorySegment) // not trivial ones
Components.emplace_back(*Dir, I);		Components.emplace_back(*Dir, I);
}		}
llvm::sort(Paths.begin(), Paths.end());		llvm::sort(Paths.begin(), Paths.end());
llvm::sort(Stems.begin(), Stems.end());		llvm::sort(Stems.begin(), Stems.end());
llvm::sort(Components.begin(), Components.end());		llvm::sort(Components.begin(), Components.end());
}		}

bool empty() const { return Commands.empty(); }		bool empty() const { return Paths.empty(); }

// Returns the command that best fits OriginalFilename.		// Returns the path for the file that best fits OriginalFilename.
// Candidates with PreferLanguage will be chosen over others (unless it's		// Candidates with extensions matching PreferLanguage will be chosen over
// TY_INVALID, or all candidates are bad).		// others (unless it's TY_INVALID, or all candidates are bad).
const TransferableCommand &chooseProxy(StringRef OriginalFilename,		StringRef chooseProxy(StringRef OriginalFilename,
types::ID PreferLanguage) const {		types::ID PreferLanguage) const {
assert(!empty() && "need at least one candidate!");		assert(!empty() && "need at least one candidate!");
std::string Filename = OriginalFilename.lower();		std::string Filename = OriginalFilename.lower();
auto Candidates = scoreCandidates(Filename);		auto Candidates = scoreCandidates(Filename);
std::pair<size_t, int> Best =		std::pair<size_t, int> Best =
pickWinner(Candidates, Filename, PreferLanguage);		pickWinner(Candidates, Filename, PreferLanguage);

DEBUG_WITH_TYPE("interpolate",		DEBUG_WITH_TYPE(
llvm::dbgs()		"interpolate",
<< "interpolate: chose "		llvm::dbgs() << "interpolate: chose " << OriginalPaths[Best.first]
<< Commands[Best.first].Cmd.Filename << " as proxy for "		<< " as proxy for " << OriginalFilename << " preferring "
<< OriginalFilename << " preferring "
<< (PreferLanguage == types::TY_INVALID		<< (PreferLanguage == types::TY_INVALID
? "none"		? "none"
: types::getTypeName(PreferLanguage))		: types::getTypeName(PreferLanguage))
<< " score=" << Best.second << "\n");		<< " score=" << Best.second << "\n");
return Commands[Best.first];		return OriginalPaths[Best.first];
}		}

private:		private:
using SubstringAndIndex = std::pair<StringRef, size_t>;		using SubstringAndIndex = std::pair<StringRef, size_t>;
// Directory matching parameters: we look at the last two segments of the		// Directory matching parameters: we look at the last two segments of the
// parent directory (usually the semantically significant ones in practice).		// parent directory (usually the semantically significant ones in practice).
// We search only the last four of each candidate (for efficiency).		// We search only the last four of each candidate (for efficiency).
constexpr static int DirectorySegmentsIndexed = 4;		constexpr static int DirectorySegmentsIndexed = 4;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	struct ScoredCandidate {
size_t PrefixLength;		size_t PrefixLength;
};		};
// Choose the best candidate by (preferred, points, prefix length, alpha).		// Choose the best candidate by (preferred, points, prefix length, alpha).
ScoredCandidate Best = {size_t(-1), false, 0, 0};		ScoredCandidate Best = {size_t(-1), false, 0, 0};
for (const auto &Candidate : Candidates) {		for (const auto &Candidate : Candidates) {
ScoredCandidate S;		ScoredCandidate S;
S.Index = Candidate.first;		S.Index = Candidate.first;
S.Preferred = PreferredLanguage == types::TY_INVALID \|\|		S.Preferred = PreferredLanguage == types::TY_INVALID \|\|
PreferredLanguage == Commands[S.Index].Type;		PreferredLanguage == Types[S.Index];
S.Points = Candidate.second;		S.Points = Candidate.second;
if (!S.Preferred && Best.Preferred)		if (!S.Preferred && Best.Preferred)
continue;		continue;
if (S.Preferred == Best.Preferred) {		if (S.Preferred == Best.Preferred) {
if (S.Points < Best.Points)		if (S.Points < Best.Points)
continue;		continue;
if (S.Points == Best.Points) {		if (S.Points == Best.Points) {
S.PrefixLength = matchingPrefix(Filename, Paths[S.Index].first);		S.PrefixLength = matchingPrefix(Filename, Paths[S.Index].first);
Show All 16 Lines	if (Best.Index == size_t(-1))
return {longestMatch(Filename, Paths).second, 0};		return {longestMatch(Filename, Paths).second, 0};
return {Best.Index, Best.Points};		return {Best.Index, Best.Points};
}		}

// Returns the range within a sorted index that compares equal to Key.		// Returns the range within a sorted index that compares equal to Key.
// If Prefix is true, it's instead the range starting with Key.		// If Prefix is true, it's instead the range starting with Key.
template <bool Prefix>		template <bool Prefix>
ArrayRef<SubstringAndIndex>		ArrayRef<SubstringAndIndex>
indexLookup(StringRef Key, const std::vector<SubstringAndIndex> &Idx) const {		indexLookup(StringRef Key, ArrayRef<SubstringAndIndex> Idx) const {
// Use pointers as iteratiors to ease conversion of result to ArrayRef.		// Use pointers as iteratiors to ease conversion of result to ArrayRef.
auto Range = std::equal_range(Idx.data(), Idx.data() + Idx.size(), Key,		auto Range = std::equal_range(Idx.data(), Idx.data() + Idx.size(), Key,
Less<Prefix>());		Less<Prefix>());
return {Range.first, Range.second};		return {Range.first, Range.second};
}		}

// Performs a point lookup into a nonempty index, returning a longest match.		// Performs a point lookup into a nonempty index, returning a longest match.
SubstringAndIndex		SubstringAndIndex longestMatch(StringRef Key,
longestMatch(StringRef Key, const std::vector<SubstringAndIndex> &Idx) const {		ArrayRef<SubstringAndIndex> Idx) const {
assert(!Idx.empty());		assert(!Idx.empty());
// Longest substring match will be adjacent to a direct lookup.		// Longest substring match will be adjacent to a direct lookup.
auto It =		auto It =
std::lower_bound(Idx.begin(), Idx.end(), SubstringAndIndex{Key, 0});		std::lower_bound(Idx.begin(), Idx.end(), SubstringAndIndex{Key, 0});
if (It == Idx.begin())		if (It == Idx.begin())
return *It;		return *It;
if (It == Idx.end())		if (It == Idx.end())
return *--It;		return *--It;
// Have to choose between It and It-1		// Have to choose between It and It-1
size_t Prefix = matchingPrefix(Key, It->first);		size_t Prefix = matchingPrefix(Key, It->first);
size_t PrevPrefix = matchingPrefix(Key, (It - 1)->first);		size_t PrevPrefix = matchingPrefix(Key, (It - 1)->first);
return Prefix > PrevPrefix ? It : --It;		return Prefix > PrevPrefix ? It : --It;
}		}

std::vector<TransferableCommand> Commands; // Indexes point into this.		// Original paths, everything else is in lowercase.
		std::vector<std::string> OriginalPaths;
BumpPtrAllocator Arena;		BumpPtrAllocator Arena;
StringSaver Strings;		StringSaver Strings;
// Indexes of candidates by certain substrings.		// Indexes of candidates by certain substrings.
// String is lowercase and sorted, index points into OriginalPaths.		// String is lowercase and sorted, index points into OriginalPaths.
std::vector<SubstringAndIndex> Paths; // Full path.		std::vector<SubstringAndIndex> Paths; // Full path.
		// Lang types obtained by guessing on the corresponding path. I-th element is
		// a type for the I-th path.
		std::vector<types::ID> Types;
std::vector<SubstringAndIndex> Stems; // Basename, without extension.		std::vector<SubstringAndIndex> Stems; // Basename, without extension.
std::vector<SubstringAndIndex> Components; // Last path components.		std::vector<SubstringAndIndex> Components; // Last path components.
};		};

// The actual CompilationDatabase wrapper delegates to its inner database.		// The actual CompilationDatabase wrapper delegates to its inner database.
// If no match, looks up a command in CommandIndex and transfers it to the file.		// If no match, looks up a proxy file in FileIndex and transfers its
		// command to the requested file.
class InterpolatingCompilationDatabase : public CompilationDatabase {		class InterpolatingCompilationDatabase : public CompilationDatabase {
public:		public:
InterpolatingCompilationDatabase(std::unique_ptr<CompilationDatabase> Inner)		InterpolatingCompilationDatabase(std::unique_ptr<CompilationDatabase> Inner)
: Inner(std::move(Inner)), Index(allCommands()) {}		: Inner(std::move(Inner)), Index(this->Inner->getAllFiles()) {}

std::vector<CompileCommand>		std::vector<CompileCommand>
getCompileCommands(StringRef Filename) const override {		getCompileCommands(StringRef Filename) const override {
auto Known = Inner->getCompileCommands(Filename);		auto Known = Inner->getCompileCommands(Filename);
if (Index.empty() \|\| !Known.empty())		if (Index.empty() \|\| !Known.empty())
return Known;		return Known;
bool TypeCertain;		bool TypeCertain;
auto Lang = guessType(Filename, &TypeCertain);		auto Lang = guessType(Filename, &TypeCertain);
if (!TypeCertain)		if (!TypeCertain)
Lang = types::TY_INVALID;		Lang = types::TY_INVALID;
return {Index.chooseProxy(Filename, foldType(Lang)).transferTo(Filename)};		auto ProxyCommands =
		Inner->getCompileCommands(Index.chooseProxy(Filename, foldType(Lang)));
		if (ProxyCommands.empty())
		return {};
		return {TransferableCommand(ProxyCommands[0]).transferTo(Filename)};
}		}

std::vector<std::string> getAllFiles() const override {		std::vector<std::string> getAllFiles() const override {
return Inner->getAllFiles();		return Inner->getAllFiles();
}		}

std::vector<CompileCommand> getAllCompileCommands() const override {		std::vector<CompileCommand> getAllCompileCommands() const override {
return Inner->getAllCompileCommands();		return Inner->getAllCompileCommands();
}		}

private:		private:
std::vector<TransferableCommand> allCommands() {
std::vector<TransferableCommand> Result;
for (auto Command : Inner->getAllCompileCommands()) {
Result.emplace_back(std::move(Command));
if (Result.back().Type == types::TY_INVALID)
Result.pop_back();
}
return Result;
}

std::unique_ptr<CompilationDatabase> Inner;		std::unique_ptr<CompilationDatabase> Inner;
CommandIndex Index;		FileIndex Index;
};		};

} // namespace		} // namespace

std::unique_ptr<CompilationDatabase>		std::unique_ptr<CompilationDatabase>
inferMissingCompileCommands(std::unique_ptr<CompilationDatabase> Inner) {		inferMissingCompileCommands(std::unique_ptr<CompilationDatabase> Inner) {
return llvm::make_unique<InterpolatingCompilationDatabase>(std::move(Inner));		return llvm::make_unique<InterpolatingCompilationDatabase>(std::move(Inner));
}		}

} // namespace tooling		} // namespace tooling
} // namespace clang		} // namespace clang

cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	TEST_F(InterpolateTest, Nearby) {
EXPECT_EQ(getCommand("an/other/b.cpp"), "clang -D an/other/foo.cpp");		EXPECT_EQ(getCommand("an/other/b.cpp"), "clang -D an/other/foo.cpp");
// if nothing matches at all, we still get the closest alpha match		// if nothing matches at all, we still get the closest alpha match
EXPECT_EQ(getCommand("below/some/obscure/path.cpp"),		EXPECT_EQ(getCommand("below/some/obscure/path.cpp"),
"clang -D an/other/foo.cpp");		"clang -D an/other/foo.cpp");
}		}

TEST_F(InterpolateTest, Language) {		TEST_F(InterpolateTest, Language) {
add("dir/foo.cpp", "-std=c++17");		add("dir/foo.cpp", "-std=c++17");
		add("dir/bar.c", "");
add("dir/baz.cee", "-x c");		add("dir/baz.cee", "-x c");

// .h is ambiguous, so we add explicit language flags		// .h is ambiguous, so we add explicit language flags
EXPECT_EQ(getCommand("foo.h"),		EXPECT_EQ(getCommand("foo.h"),
"clang -D dir/foo.cpp -x c++-header -std=c++17");		"clang -D dir/foo.cpp -x c++-header -std=c++17");
// and don't add -x if the inferred language is correct.		// and don't add -x if the inferred language is correct.
EXPECT_EQ(getCommand("foo.hpp"), "clang -D dir/foo.cpp -std=c++17");		EXPECT_EQ(getCommand("foo.hpp"), "clang -D dir/foo.cpp -std=c++17");
// respect -x if it's already there.		// respect -x if it's already there.
EXPECT_EQ(getCommand("baz.h"), "clang -D dir/baz.cee -x c-header");		EXPECT_EQ(getCommand("baz.h"), "clang -D dir/baz.cee -x c-header");
// prefer a worse match with the right language		// prefer a worse match with the right extension.
EXPECT_EQ(getCommand("foo.c"), "clang -D dir/baz.cee");		EXPECT_EQ(getCommand("foo.c"), "clang -D dir/bar.c");
Entries.erase(path(StringRef("dir/baz.cee")));		// make sure we don't crash on queries with invalid extensions.
		EXPECT_EQ(getCommand("foo.cce"), "clang -D dir/foo.cpp");
		Entries.erase(path(StringRef("dir/bar.c")));
// Now we transfer across languages, so drop -std too.		// Now we transfer across languages, so drop -std too.
EXPECT_EQ(getCommand("foo.c"), "clang -D dir/foo.cpp");		EXPECT_EQ(getCommand("foo.c"), "clang -D dir/foo.cpp");
}		}

TEST_F(InterpolateTest, Strip) {		TEST_F(InterpolateTest, Strip) {
add("dir/foo.cpp", "-o foo.o -Wall");		add("dir/foo.cpp", "-o foo.o -Wall");
// the -o option and the input file are removed, but -Wall is preserved.		// the -o option and the input file are removed, but -Wall is preserved.
EXPECT_EQ(getCommand("dir/bar.cpp"), "clang -D dir/foo.cpp -Wall");		EXPECT_EQ(getCommand("dir/bar.cpp"), "clang -D dir/foo.cpp -Wall");
Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Parse compile commands lazily in InterpolatingCompilationDatabaseClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 162889

cfe/trunk/lib/Tooling/InterpolatingCompilationDatabase.cpp

cfe/trunk/unittests/Tooling/CompilationDatabaseTest.cpp

Parse compile commands lazily in InterpolatingCompilationDatabase
ClosedPublic