Download Raw Diff

Details

Reviewers

vsapsai
Bigcheese
doug.gregor
aprantl

Summary

The ModuleManager's use of FileEntry nodes as the keys for its map of
loaded modules is less than ideal. Uniqueness for FileEntry nodes is
maintained by FileManager, which in turn uses inode numbers on hosts
that support that. When coupled with the module cache's proclivity for
turning over and deleting stale PCMs, this means entries for different
module files can wind up reusing the same underlying inode. When this
happens, subsequent accesses to the Modules map will disagree on the
ModuleFile associated with a given file.

It's fine to use the file management utilities to guarantee the presence
of module data, but we need a better source of key material that is
invariant with respect to these OS-level details. Using file paths alone
is a similarly frought solution because the ASTWriter performs a custom
canonicalization step (that is not equivalent to path canonicalization)
that renders keys from loaded AST cores useless for looking up cached
entries.

To mitigate the effects of inode reuse, increase the entropy of the key
material by incorporating the modtime and size. This ultimately
decreases the likelihood that a PCM that is swapped on disk will confuse
the cache, but it does not eliminate the possibility of collisions.

rdar://48443680

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

CodaFi created this revision.Aug 14 2020, 10:55 AM

Herald added subscribers: cfe-commits, dexonsmith. · View Herald TranscriptAug 14 2020, 10:55 AM

CodaFi requested review of this revision.Aug 14 2020, 10:55 AM

Overall, looks like a reasonable approach to solve inode reuse. The problem with filenames is that they might not be canonicalized and the same file can be known by different filenames. I'm trying to think through the consequences in the following scenarios:

same name but file content has changed;
different names but refer to the same file.

Harbormaster completed remote builds in B68432: Diff 285690.Aug 14 2020, 11:31 AM

In D85981#2218583, @vsapsai wrote:

It's good to imagine these attack vectors. But I think the module cache being a relatively fault-tolerant and compiler-controlled system mitigates a lot of the damage you could cause by a well-timed "attack" in these scenarios:

same name but file content has changed;

If there is a cache entry, the signature check that occurs after the lookup succeeds should catch most shenanigans. Assuming an attacker is able to craft a PCM with an equivalent signature to the victim PCM, and was able to time it such that the PCM were replaced after a subsequent read, you could definitely run into problems. But our "attackers" in most scenarios are usually other cc1 and swiftc invocations trying to build the same module, so we should see signature changes at least.

different names but refer to the same file.

Then we'll waste space in the cache, but this requires the ability to predict the layout of the module cache ahead of time. It shouldn't affect the consistency of the entries in the table to do extra work - assuming you don't combine this approach with the scenario described above.

I'd also note here that the InMemoryModuleCache is already using a StringMap keyed by file names for its PCM table. You can see this patch as a kind of harmonization between the two approaches.

CodaFi updated this revision to Diff 285868.Aug 15 2020, 3:57 PM

Harbormaster completed remote builds in B68536: Diff 285868.Aug 15 2020, 5:16 PM

CodaFi updated this revision to Diff 285873.Aug 15 2020, 5:33 PM

Harbormaster completed remote builds in B68540: Diff 285873.Aug 15 2020, 6:07 PM

aprantl added a subscriber: aprantl.Aug 15 2020, 6:36 PM

aprantl added inline comments.

clang/include/clang/Serialization/ModuleManager.h
122	Is it literally the file name, or something like the absolute realpath? And just because I'm curious: Is this the name of the .pcm or of the module map file?
122	I just realized @vsapsai already asked the same question :-)

CodaFi updated this revision to Diff 285874.Aug 15 2020, 6:38 PM

Harbormaster completed remote builds in B68541: Diff 285874.Aug 15 2020, 7:11 PM

CodaFi updated this revision to Diff 285875.Aug 15 2020, 7:26 PM

CodaFi added inline comments.Aug 15 2020, 7:35 PM

clang/include/clang/Serialization/ModuleManager.h
122	It's the file path the module cache has computed for the PCM. I could try to use the real path to the file, but I'm not sure how portable/stable that interface is relative to this one.

Okay, I'm done throwing revisions at the bots. This windows-only failure is bizarre. @rsmith Do you have any insight into what's going wrong here?

Harbormaster completed remote builds in B68542: Diff 285875.Aug 15 2020, 7:56 PM

CodaFi updated this revision to Diff 286201.Aug 17 2020, 10:19 PM

Figured it out for myself. The test is forming paths that are using non-canonical path separators. Naively using those as keys means that the subsequent canonicalization done by the ASTWriter renders the keys useless for lookups into these structures. I'm going to switch to FileEntry::tryGetRealPathName since it's quite literally what ASTWriter is doing as part of its canonicalization phase. I worry about that as a solution in general though. In the future, it would be great to expose the canonicalization utilities in the ASTWriter as a more general kind of facility that could be shared between the implementations so we don't desync things again.

Harbormaster completed remote builds in B68712: Diff 286201.Aug 17 2020, 11:02 PM

CodaFi updated this revision to Diff 286334.Aug 18 2020, 10:12 AM

CodaFi edited the summary of this revision. (Show Details)

CodaFi marked 2 inline comments as done.

Switched tactics here. Rather than just change the source of the entropy, let's increase it from just inodes to (64-bits of inode) plus (file size) plus (mod time). It is still possible to defeat this scheme, but it means an attacker would have to replace the PCM with one that has been padded out to the same size then backdate its modtime to match the one in the cache - or some cascading failure of the syscalls providing these data conspires to make this happen.

Harbormaster completed remote builds in B68771: Diff 286334.Aug 18 2020, 10:42 AM

CodaFi updated this revision to Diff 286361.Aug 18 2020, 11:06 AM

Harbormaster completed remote builds in B68782: Diff 286361.Aug 18 2020, 11:45 AM

aprantl added inline comments.Aug 18 2020, 1:39 PM

clang/include/clang/Serialization/ModuleManager.h
122	If it's the path to the `.pcm` there's no point in wasting time on realpath — there should only be one module cache path and we don't care where exactly it is on disk, and the paths inside the module cache ought to be unique anyway, because we just computed them. Thanks!

aprantl added inline comments.Aug 18 2020, 1:40 PM

clang/include/clang/Serialization/ModuleManager.h
62	Can you add a doxygen comment explaining why we compute our own hashing as opposed to using the FileEntry pointer?
62	Basically what you wrote in the description of the review...

@aprantl Good idea. Updated.

Harbormaster completed remote builds in B68837: Diff 286460.Aug 18 2020, 8:17 PM

aprantl accepted this revision.Aug 19 2020, 9:33 AM

This revision is now accepted and ready to land.Aug 19 2020, 9:33 AM

CodaFi marked an inline comment as done.Aug 19 2020, 1:11 PM

CodaFi updated this revision to Diff 286679.Aug 19 2020, 3:40 PM

CodaFi retitled this revision from [clang][Modules] Use File Names Instead of inodes As Loaded Module Keys to [clang][Modules] Increase the Entropy of ModuleManager Map Keys.

Harbormaster completed remote builds in B68957: Diff 286679.Aug 19 2020, 4:20 PM

CodaFi updated this revision to Diff 286697.Aug 19 2020, 5:45 PM

Harbormaster completed remote builds in B68971: Diff 286697.Aug 19 2020, 6:50 PM

We have tested this proposed change out on our CI systems and have seen no relief from the symptoms of inode reuse with this approach. Abandoning this revision in favor of a more narrow fix.

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 28 2020, 4:21 PM

Diff 286697

clang/include/clang/Serialization/ModuleManager.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	class ModuleManager {
/// by the user, the last one is the one that doesn't depend on anything		/// by the user, the last one is the one that doesn't depend on anything
/// further.		/// further.
SmallVector<ModuleFile *, 2> PCHChain;		SmallVector<ModuleFile *, 2> PCHChain;

// The roots of the dependency DAG of AST files. This is used		// The roots of the dependency DAG of AST files. This is used
// to implement short-circuiting logic when running DFS over the dependencies.		// to implement short-circuiting logic when running DFS over the dependencies.
SmallVector<ModuleFile *, 2> Roots;		SmallVector<ModuleFile *, 2> Roots;

		/// An \c EntryKey is a thin wrapper around a \c FileEntry that implements
		aprantlUnsubmitted Done Reply Inline Actions Can you add a doxygen comment explaining why we compute our own hashing as opposed to using the FileEntry pointer? aprantl: Can you add a doxygen comment explaining why we compute our own hashing as opposed to using the…
		aprantlUnsubmitted Not Done Reply Inline Actions Basically what you wrote in the description of the review... aprantl: Basically what you wrote in the description of the review...
		/// a richer notion of identity.
		///
		/// A plain \c FileEntry has its identity tied to inode numbers. When the
		/// module cache regenerates a PCM, some filesystem allocators may reuse
		/// inode numbers for distinct modules, which can cause the cache to return
		/// mismatched entries. An \c EntryKey ensures that the size and modification
		/// time are taken into account when determining the identity of a key, which
		/// significantly decreases - but does not eliminate - the chance of
		/// a collision.
		struct EntryKey {
		const FileEntry *Entry;
		off_t Size;
		time_t ModTime;

		EntryKey(const FileEntry *Entry) : Entry(Entry), Size(0), ModTime(0) {
		if (Entry) {
		Size = Entry->getSize();
		ModTime = Entry->getModificationTime();
		}
		}

		EntryKey(const FileEntry *Entry, off_t Size, time_t ModTime)
		: Entry(Entry), Size(Size), ModTime(ModTime) {}

		struct Info {
		static inline EntryKey getEmptyKey() {
		return EntryKey{
		llvm::DenseMapInfo<const FileEntry *>::getEmptyKey(),
		llvm::DenseMapInfo<off_t>::getEmptyKey(),
		llvm::DenseMapInfo<time_t>::getEmptyKey(),
		};
		}
		static inline EntryKey getTombstoneKey() {
		return EntryKey{
		llvm::DenseMapInfo<const FileEntry *>::getTombstoneKey(),
		llvm::DenseMapInfo<off_t>::getTombstoneKey(),
		llvm::DenseMapInfo<time_t>::getTombstoneKey(),
		};
		}
		static unsigned getHashValue(const EntryKey &Val) {
		return llvm::DenseMapInfo<const FileEntry *>::getHashValue(Val.Entry);
		}
		static bool isEqual(const EntryKey &LHS, const EntryKey &RHS) {
		if (LHS.Entry == getEmptyKey().Entry \|\|
		LHS.Entry == getTombstoneKey().Entry \|\|
		RHS.Entry == getEmptyKey().Entry \|\|
		RHS.Entry == getTombstoneKey().Entry) {
		return LHS.Entry == RHS.Entry;
		}
		if (LHS.Entry == nullptr \|\| RHS.Entry == nullptr) {
		return LHS.Entry == RHS.Entry;
		}
		return LHS.Entry == RHS.Entry && LHS.Size == RHS.Size &&
		LHS.ModTime == RHS.ModTime;
		}
		};
		};

/// All loaded modules, indexed by name.		/// All loaded modules, indexed by name.
llvm::DenseMap<const FileEntry , ModuleFile > Modules;		llvm::DenseMap<EntryKey, ModuleFile *, EntryKey::Info> Modules;
		aprantlUnsubmitted Done Reply Inline Actions Is it literally the file name, or something like the absolute realpath? And just because I'm curious: Is this the name of the .pcm or of the module map file? aprantl: Is it literally the file name, or something like the absolute realpath? And just because I'm…
		aprantlUnsubmitted Done Reply Inline Actions I just realized @vsapsai already asked the same question :-) aprantl: I just realized @vsapsai already asked the same question :-)
		CodaFiAuthorUnsubmitted Done Reply Inline Actions It's the file path the module cache has computed for the PCM. I could try to use the real path to the file, but I'm not sure how portable/stable that interface is relative to this one. CodaFi: It's the file path the module cache has computed for the PCM. I could try to use the real path…
		aprantlUnsubmitted Not Done Reply Inline Actions If it's the path to the `.pcm` there's no point in wasting time on realpath — there should only be one module cache path and we don't care where exactly it is on disk, and the paths inside the module cache ought to be unique anyway, because we just computed them. Thanks! aprantl: If it's the path to the `.pcm` there's no point in wasting time on realpath — there should only…

/// FileManager that handles translating between filenames and		/// FileManager that handles translating between filenames and
/// FileEntry *.		/// FileEntry *.
FileManager &FileMgr;		FileManager &FileMgr;

/// Cache of PCM files.		/// Cache of PCM files.
IntrusiveRefCntPtr<InMemoryModuleCache> ModuleCache;		IntrusiveRefCntPtr<InMemoryModuleCache> ModuleCache;

/// Knows how to unwrap module containers.		/// Knows how to unwrap module containers.
const PCHContainerReader &PCHContainerRdr;		const PCHContainerReader &PCHContainerRdr;

/// Preprocessor's HeaderSearchInfo containing the module map.		/// Preprocessor's HeaderSearchInfo containing the module map.
const HeaderSearch &HeaderSearchInfo;		const HeaderSearch &HeaderSearchInfo;

/// A lookup of in-memory (virtual file) buffers		/// A lookup of in-memory (virtual file) buffers
llvm::DenseMap<const FileEntry *, std::unique_ptr<llvm::MemoryBuffer>>		llvm::DenseMap<EntryKey, std::unique_ptr<llvm::MemoryBuffer>, EntryKey::Info>
InMemoryBuffers;		InMemoryBuffers;

/// The visitation order.		/// The visitation order.
SmallVector<ModuleFile *, 4> VisitOrder;		SmallVector<ModuleFile *, 4> VisitOrder;

/// The list of module files that both we and the global module index		/// The list of module files that both we and the global module index
/// know about.		/// know about.
///		///
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

clang/lib/Serialization/ModuleManager.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	ModuleFile *ModuleManager::lookupByModuleName(StringRef Name) const {
if (const Module *Mod = HeaderSearchInfo.getModuleMap().findModule(Name))		if (const Module *Mod = HeaderSearchInfo.getModuleMap().findModule(Name))
if (const FileEntry *File = Mod->getASTFile())		if (const FileEntry *File = Mod->getASTFile())
return lookup(File);		return lookup(File);

return nullptr;		return nullptr;
}		}

ModuleFile ModuleManager::lookup(const FileEntry File) const {		ModuleFile ModuleManager::lookup(const FileEntry File) const {
auto Known = Modules.find(File);		auto Known = Modules.find(EntryKey{File});
if (Known == Modules.end())		if (Known == Modules.end())
return nullptr;		return nullptr;

return Known->second;		return Known->second;
}		}

std::unique_ptr<llvm::MemoryBuffer>		std::unique_ptr<llvm::MemoryBuffer>
ModuleManager::lookupBuffer(StringRef Name) {		ModuleManager::lookupBuffer(StringRef Name) {
auto Entry = FileMgr.getFile(Name, /OpenFile=/false,		auto Entry = FileMgr.getFile(Name, /OpenFile=/false,
/CacheFailure=/false);		/CacheFailure=/false);
if (!Entry)		if (!Entry)
return nullptr;		return nullptr;
return std::move(InMemoryBuffers[*Entry]);		return std::move(InMemoryBuffers[EntryKey{*Entry}]);
}		}

static bool checkSignature(ASTFileSignature Signature,		static bool checkSignature(ASTFileSignature Signature,
ASTFileSignature ExpectedSignature,		ASTFileSignature ExpectedSignature,
std::string &ErrorStr) {		std::string &ErrorStr) {
if (!ExpectedSignature \|\| Signature == ExpectedSignature)		if (!ExpectedSignature \|\| Signature == ExpectedSignature)
return false;		return false;

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	ModuleManager::addModule(StringRef FileName, ModuleKind Type,
}		}

if (!Entry && FileName != "-") {		if (!Entry && FileName != "-") {
ErrorStr = "module file not found";		ErrorStr = "module file not found";
return Missing;		return Missing;
}		}

// Check whether we already loaded this module, before		// Check whether we already loaded this module, before
if (ModuleFile *ModuleEntry = Modules.lookup(Entry)) {		if (ModuleFile *ModuleEntry = Modules.lookup(EntryKey{Entry})) {
// Check the stored signature.		// Check the stored signature.
if (checkSignature(ModuleEntry->Signature, ExpectedSignature, ErrorStr))		if (checkSignature(ModuleEntry->Signature, ExpectedSignature, ErrorStr))
return OutOfDate;		return OutOfDate;

Module = ModuleEntry;		Module = ModuleEntry;
updateModuleImports(*ModuleEntry, ImportedBy, ImportLoc);		updateModuleImports(*ModuleEntry, ImportedBy, ImportLoc);
return AlreadyLoaded;		return AlreadyLoaded;
}		}
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	ModuleManager::addModule(StringRef FileName, ModuleKind Type,

// Read the signature eagerly now so that we can check it. Avoid calling		// Read the signature eagerly now so that we can check it. Avoid calling
// ReadSignature unless there's something to check though.		// ReadSignature unless there's something to check though.
if (ExpectedSignature && checkSignature(ReadSignature(NewModule->Data),		if (ExpectedSignature && checkSignature(ReadSignature(NewModule->Data),
ExpectedSignature, ErrorStr))		ExpectedSignature, ErrorStr))
return OutOfDate;		return OutOfDate;

// We're keeping this module. Store it everywhere.		// We're keeping this module. Store it everywhere.
Module = Modules[Entry] = NewModule.get();		Module = Modules[EntryKey{Entry}] = NewModule.get();

updateModuleImports(*NewModule, ImportedBy, ImportLoc);		updateModuleImports(*NewModule, ImportedBy, ImportLoc);

if (!NewModule->isModule())		if (!NewModule->isModule())
PCHChain.push_back(NewModule.get());		PCHChain.push_back(NewModule.get());
if (!ImportedBy)		if (!ImportedBy)
Roots.push_back(NewModule.get());		Roots.push_back(NewModule.get());

Show All 30 Lines	for (auto I = First; I != Last; ++I) {
if (!I->isModule()) {		if (!I->isModule()) {
PCHChain.erase(llvm::find(PCHChain, &*I), PCHChain.end());		PCHChain.erase(llvm::find(PCHChain, &*I), PCHChain.end());
break;		break;
}		}
}		}

// Delete the modules and erase them from the various structures.		// Delete the modules and erase them from the various structures.
for (ModuleIterator victim = First; victim != Last; ++victim) {		for (ModuleIterator victim = First; victim != Last; ++victim) {
Modules.erase(victim->File);		Modules.erase(EntryKey{victim->File});

if (modMap) {		if (modMap) {
StringRef ModuleName = victim->ModuleName;		StringRef ModuleName = victim->ModuleName;
if (Module *mod = modMap->findModule(ModuleName)) {		if (Module *mod = modMap->findModule(ModuleName)) {
mod->setASTFile(nullptr);		mod->setASTFile(nullptr);
}		}
}		}
}		}

// Delete the modules.		// Delete the modules.
Chain.erase(Chain.begin() + (First - begin()), Chain.end());		Chain.erase(Chain.begin() + (First - begin()), Chain.end());
}		}

void		void
ModuleManager::addInMemoryBuffer(StringRef FileName,		ModuleManager::addInMemoryBuffer(StringRef FileName,
std::unique_ptr<llvm::MemoryBuffer> Buffer) {		std::unique_ptr<llvm::MemoryBuffer> Buffer) {
const FileEntry *Entry =		const FileEntry *Entry =
FileMgr.getVirtualFile(FileName, Buffer->getBufferSize(), 0);		FileMgr.getVirtualFile(FileName, Buffer->getBufferSize(), 0);
InMemoryBuffers[Entry] = std::move(Buffer);		InMemoryBuffers[EntryKey{Entry}] = std::move(Buffer);
}		}

ModuleManager::VisitState *ModuleManager::allocateVisitState() {		ModuleManager::VisitState *ModuleManager::allocateVisitState() {
// Fast path: if we have a cached state, use it.		// Fast path: if we have a cached state, use it.
if (FirstVisitState) {		if (FirstVisitState) {
VisitState *Result = FirstVisitState;		VisitState *Result = FirstVisitState;
FirstVisitState = FirstVisitState->NextState;		FirstVisitState = FirstVisitState->NextState;
Result->NextState = nullptr;		Result->NextState = nullptr;
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang][Modules] Increase the Entropy of ModuleManager Map Keys
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286697

clang/include/clang/Serialization/ModuleManager.h

clang/lib/Serialization/ModuleManager.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[clang][Modules] Increase the Entropy of ModuleManager Map KeysAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286697

clang/include/clang/Serialization/ModuleManager.h

clang/lib/Serialization/ModuleManager.cpp

[clang][Modules] Increase the Entropy of ModuleManager Map Keys
AbandonedPublic