This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/MachO/
-
MachO/
1/1
InputFiles.cpp

Differential D113153

[lld-macho] Cache readFile results
ClosedPublic

Authored by keith on Nov 3 2021, 5:17 PM.

Download Raw Diff

Details

Reviewers

int3
smeenai
gkm

Group Reviewers

Restricted Project

Commits

rGd49e7244cc01: [lld-macho] Cache readFile results

Summary

In one of our links lld was reading 760k files, but the unique number of
files was only 1500. This takes that link from 30 seconds to 8.

This seems like a heavy hammer, especially since some things don't need
to be cached, like the filelist arguments and the passed static
archives (the latter is already cached as a one off), but it seems ld64
does something similar here to short circuit these duplicate reads:

https://github.com/keith/ld64/blob/82e429e186488529111b0ef86af33a3b1b9438c7/src/ld/InputFiles.cpp#L644-L665

Of the types of files being read for our iOS app, the biggest problem
was constantly re-reading small tbd files:

% wc -l /tmp/read.txt
761414 /tmp/read.txt
% cat /tmp/read.txt | sort -u | wc -l
1503

% cat /tmp/read.txt | grep "\.a$" | wc -l
43721
% cat /tmp/read.txt | grep "\.tbd$" | wc -l
717656

We could likely hoist this logic up to not cache at this level, but it
would be a more invasive change to make sure all callers that needed it
cached the results.

I could see this being an issue with OOMs, and I'm not a linker expert so
maybe there's another way we should solve this problem? Feedback welcome!

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

keith created this revision.Nov 3 2021, 5:17 PM

Herald added a reviewer: gkm. · View Herald TranscriptNov 3 2021, 5:17 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a reviewer: Restricted Project. · View Herald Transcript

keith requested review of this revision.Nov 3 2021, 5:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2021, 5:17 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

keith mentioned this in D113073: [lld-macho] Cache library paths from findLibrary.Nov 3 2021, 5:18 PM

Harbormaster completed remote builds in B132354: Diff 384617.Nov 3 2021, 5:39 PM

I could see this being an issue with OOMs

We are pretty liberal with our memory use at the moment. E.g. we basically don't free anything until process exit, with the view that most of things we allocate live for most of the linker process' short lifetime anyway. I think we can run with this and revisit if it causes any problems in practice.

lld/MachO/InputFiles.cpp
176	Would be nice to have a comment here about how this is primarily for the caching of `.tbd` / common library files and is a bit of a heavy hammer (as mentioned in the commit message)

This revision is now accepted and ready to land.Nov 3 2021, 9:25 PM

Add comment

Harbormaster completed remote builds in B132381: Diff 384654.Nov 3 2021, 10:11 PM

Closed by commit rGd49e7244cc01: [lld-macho] Cache readFile results (authored by keith). · Explain WhyNov 3 2021, 10:15 PM

This revision was automatically updated to reflect the committed changes.

keith added a commit: rGd49e7244cc01: [lld-macho] Cache readFile results.

This one needs to be reset in the cleanupCallback as well (https://github.com/llvm/llvm-project/blob/e7fdff403e849b18d93cd4a5cb760cba66a92c0b/lld/MachO/Driver.cpp#L1098), as do all the other caches you added (if they aren't already). The cache itself would technically be valid, but we free all our memory arenas at the end of a linker run, so the entries would point to freed memory.

Out of curiosity, is the link that takes 8 seconds now the same one that ld64 takes 20 seconds for? If not, I'd be curious to know how the speed for that one compares after all your changes.

Thanks for all the fixes!

In D113153#3108083, @smeenai wrote:

This one needs to be reset in the cleanupCallback as well (https://github.com/llvm/llvm-project/blob/e7fdff403e849b18d93cd4a5cb760cba66a92c0b/lld/MachO/Driver.cpp#L1098), as do all the other caches you added (if they aren't already). The cache itself would technically be valid, but we free all our memory arenas at the end of a linker run, so the entries would point to freed memory.

Ah thanks, I wondered but should have asked. https://reviews.llvm.org/D113198

Out of curiosity, is the link that takes 8 seconds now the same one that ld64 takes 20 seconds for? If not, I'd be curious to know how the speed for that one compares after all your changes.

Yes, so for the same link our benchmarks have been:

lld before my changes: 1 minute 45 seconds
ld64 20s
zld (optimized ld64 fork) ~13s
lld with all my changes ~8s

Unfortunately the produced binary crashes on what appears to be issues with dropped Objective-C categories, I'm working on reducing a repro case, but if there are known issues around this (I couldn't find any on the bug tracker) I would love to know!

Thanks for all the fixes!

Thanks for the reviews!

keith mentioned this in D113198: [lld-macho] Clear cachedReads cache.Nov 4 2021, 9:48 AM

In D113153#3109272, @keith wrote:

Unfortunately the produced binary crashes on what appears to be issues with dropped Objective-C categories, I'm working on reducing a repro case, but if there are known issues around this (I couldn't find any on the bug tracker) I would love to know!

We haven't seen any issues like that on our end yet, as far as I'm aware. The repro will be interesting once you've got it :)

keith mentioned this in rG0bce3e3b843f: [lld-macho] Clear resolvedReads cache.Nov 4 2021, 6:08 PM

In D113153#3109794, @smeenai wrote:

In D113153#3109272, @keith wrote:

Unfortunately the produced binary crashes on what appears to be issues with dropped Objective-C categories, I'm working on reducing a repro case, but if there are known issues around this (I couldn't find any on the bug tracker) I would love to know!

We haven't seen any issues like that on our end yet, as far as I'm aware. The repro will be interesting once you've got it :)

Filed https://bugs.llvm.org/show_bug.cgi?id=52480 with a repro!

Revision Contents

Path

Size

lld/

MachO/

InputFiles.cpp

11 lines

Diff 384617

lld/MachO/InputFiles.cpp

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	static bool checkCompatibility(const InputFile *input) {
if (it->minimum > config->platformInfo.minimum)		if (it->minimum > config->platformInfo.minimum)
warn(toString(input) + " has version " + it->minimum.getAsString() +		warn(toString(input) + " has version " + it->minimum.getAsString() +
", which is newer than target minimum of " +		", which is newer than target minimum of " +
config->platformInfo.minimum.getAsString());		config->platformInfo.minimum.getAsString());

return true;		return true;
}		}

		static DenseMap<CachedHashStringRef, MemoryBufferRef> resolvedReads;
		int3Unsubmitted Done Reply Inline Actions Would be nice to have a comment here about how this is primarily for the caching of `.tbd` / common library files and is a bit of a heavy hammer (as mentioned in the commit message) int3: Would be nice to have a comment here about how this is primarily for the caching of `.tbd` /…
// Open a given file path and return it as a memory-mapped file.		// Open a given file path and return it as a memory-mapped file.
Optional<MemoryBufferRef> macho::readFile(StringRef path) {		Optional<MemoryBufferRef> macho::readFile(StringRef path) {
		CachedHashStringRef key(path);
		auto entry = resolvedReads.find(key);
		if (entry != resolvedReads.end())
		return entry->second;

ErrorOr<std::unique_ptr<MemoryBuffer>> mbOrErr = MemoryBuffer::getFile(path);		ErrorOr<std::unique_ptr<MemoryBuffer>> mbOrErr = MemoryBuffer::getFile(path);
if (std::error_code ec = mbOrErr.getError()) {		if (std::error_code ec = mbOrErr.getError()) {
error("cannot open " + path + ": " + ec.message());		error("cannot open " + path + ": " + ec.message());
return None;		return None;
}		}

std::unique_ptr<MemoryBuffer> &mb = *mbOrErr;		std::unique_ptr<MemoryBuffer> &mb = *mbOrErr;
MemoryBufferRef mbref = mb->getMemBufferRef();		MemoryBufferRef mbref = mb->getMemBufferRef();
make<std::unique_ptr<MemoryBuffer>>(std::move(mb)); // take mb ownership		make<std::unique_ptr<MemoryBuffer>>(std::move(mb)); // take mb ownership

// If this is a regular non-fat file, return it.		// If this is a regular non-fat file, return it.
const char *buf = mbref.getBufferStart();		const char *buf = mbref.getBufferStart();
const auto hdr = reinterpret_cast<const fat_header >(buf);		const auto hdr = reinterpret_cast<const fat_header >(buf);
if (mbref.getBufferSize() < sizeof(uint32_t) \|\|		if (mbref.getBufferSize() < sizeof(uint32_t) \|\|
read32be(&hdr->magic) != FAT_MAGIC) {		read32be(&hdr->magic) != FAT_MAGIC) {
if (tar)		if (tar)
tar->append(relativeToRoot(path), mbref.getBuffer());		tar->append(relativeToRoot(path), mbref.getBuffer());
return mbref;		return resolvedReads[key] = mbref;
}		}

// Object files and archive files may be fat files, which contain multiple		// Object files and archive files may be fat files, which contain multiple
// real files for different CPU ISAs. Here, we search for a file that matches		// real files for different CPU ISAs. Here, we search for a file that matches
// with the current link target and returns it as a MemoryBufferRef.		// with the current link target and returns it as a MemoryBufferRef.
const auto arch = reinterpret_cast<const fat_arch >(buf + sizeof(*hdr));		const auto arch = reinterpret_cast<const fat_arch >(buf + sizeof(*hdr));

for (uint32_t i = 0, n = read32be(&hdr->nfat_arch); i < n; ++i) {		for (uint32_t i = 0, n = read32be(&hdr->nfat_arch); i < n; ++i) {
if (reinterpret_cast<const char *>(arch + i + 1) >		if (reinterpret_cast<const char *>(arch + i + 1) >
buf + mbref.getBufferSize()) {		buf + mbref.getBufferSize()) {
error(path + ": fat_arch struct extends beyond end of file");		error(path + ": fat_arch struct extends beyond end of file");
return None;		return None;
}		}

if (read32be(&arch[i].cputype) != static_cast<uint32_t>(target->cpuType) \|\|		if (read32be(&arch[i].cputype) != static_cast<uint32_t>(target->cpuType) \|\|
read32be(&arch[i].cpusubtype) != target->cpuSubtype)		read32be(&arch[i].cpusubtype) != target->cpuSubtype)
continue;		continue;

uint32_t offset = read32be(&arch[i].offset);		uint32_t offset = read32be(&arch[i].offset);
uint32_t size = read32be(&arch[i].size);		uint32_t size = read32be(&arch[i].size);
if (offset + size > mbref.getBufferSize())		if (offset + size > mbref.getBufferSize())
error(path + ": slice extends beyond end of file");		error(path + ": slice extends beyond end of file");
if (tar)		if (tar)
tar->append(relativeToRoot(path), mbref.getBuffer());		tar->append(relativeToRoot(path), mbref.getBuffer());
return MemoryBufferRef(StringRef(buf + offset, size), path.copy(bAlloc));		return resolvedReads[key] = MemoryBufferRef(StringRef(buf + offset, size),
		path.copy(bAlloc));
}		}

error("unable to find matching architecture in " + path);		error("unable to find matching architecture in " + path);
return None;		return None;
}		}

InputFile::InputFile(Kind kind, const InterfaceFile &interface)		InputFile::InputFile(Kind kind, const InterfaceFile &interface)
: id(idCount++), fileKind(kind), name(saver.save(interface.getPath())) {}		: id(idCount++), fileKind(kind), name(saver.save(interface.getPath())) {}
▲ Show 20 Lines • Show All 1,219 Lines • Show Last 20 Lines