This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/DebugInfo/Symbolize/
-
llvm/
-
DebugInfo/
-
Symbolize/
1/1
Symbolize.h
-
lib/DebugInfo/Symbolize/
-
DebugInfo/
-
Symbolize/
1
Symbolize.cpp
-
tools/llvm-symbolizer/
-
llvm-symbolizer/
-
Opts.td
-
llvm-symbolizer.cpp

Differential D119784

[Symbolize] LRU cache binaries in llvm-symbolizer.
ClosedPublic

Authored by mysterymath on Feb 14 2022, 2:17 PM.

Download Raw Diff

Details

Reviewers

jhenderson
phosek
dblaikie

Commits

rG02106ec15c2e: [Symbolize] LRU cache binaries in llvm-symbolizer.

Summary

This change adds a simple LRU cache to the Symbolize class to put a cap
on llvm-symbolizer memory usage. Previously, the Symbolizer's virtual
memory footprint would grow without bound as additional binaries were
referenced.

I'm putting this out there early for an informal review, since there may be
a dramatically different/better way to go about this. I still need to
figure out a good default constant for the memory cap and benchmark the
implementation against a large symbolization workload. Right now I've
pegged max memory usage at zero for testing purposes, which evicts the whole
cache every time.

Unfortunately, it looks like StringRefs in the returned DI objects can
directly refer to the contents of binaries. Accordingly, the cache
pruning must be explicitly requested by the caller, as the caller must
guarantee that none of the returned objects will be used afterwards.

For llvm-symbolizer this a light burden; symbolization occurs
line-by-line, and the returned objects are discarded after each.

Implementation wise, there are a number of nested caches that depend
on one another. I've implemented a simple Evictor callback system to
allow derived caches to register eviction actions to occur when the
underlying binaries are evicted.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mysterymath created this revision.Feb 14 2022, 2:17 PM

Herald added a reviewer: jhenderson. · View Herald TranscriptFeb 14 2022, 2:17 PM

Herald added subscribers: rupprecht, hiraditya. · View Herald Transcript

Clang format.

Update commit msg.

mysterymath edited the summary of this revision. (Show Details)Feb 14 2022, 2:19 PM

mysterymath edited the summary of this revision. (Show Details)

mysterymath edited the summary of this revision. (Show Details)Feb 14 2022, 2:29 PM

mysterymath added reviewers: phosek, dblaikie.

mysterymath edited the summary of this revision. (Show Details)

mysterymath edited the summary of this revision. (Show Details)Feb 14 2022, 2:31 PM

Fix unnecessary diff.

mysterymath published this revision for review.Feb 14 2022, 2:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2022, 2:34 PM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B149530: Diff 408625.Feb 14 2022, 4:50 PM

I thought @netforce had already prototyped something like this? Could you link to the previous review/discussion on the subject if you can find it? (or maybe it was only internally discussed)

In D119784#3321495, @dblaikie wrote:

I thought @netforce had already prototyped something like this? Could you link to the previous review/discussion on the subject if you can find it? (or maybe it was only internally discussed)

Ah, I think I stumbled across this: https://reviews.llvm.org/D78950

Perhaps you could commandeer/pick up https://reviews.llvm.org/D90006 ? or summarize in this (or the other, D78950) that historical context/what's new/different/justified in this new review?

In D119784#3323786, @dblaikie wrote:

Perhaps you could commandeer/pick up https://reviews.llvm.org/D90006 ? or summarize in this (or the other, D78950) that historical context/what's new/different/justified in this new review?

The context for this change is to try to fix OOMs discovered when symbolizing system logs with llvm-symbolizer.
As is, llvm-symbolizer essentially leaks memory when using it's STDIN format. It's specified in a line-at-a-time fashion, but data is never cleaned up from one line to the next. Eventually, you either have to either stop feeding it lines or restart the job.

It seems like the Symbolizer is the right place to address this; it *could* mmap in and parse the debug binaries anew for each symbolization request, but it doesn't, because it's more efficient to keep this data around. At least, right up until physical memory is exhausted.
Accordingly, it shouldn't really matter what kind of derived data the Symbolizer keeps; everything should be fair game for eviction; otherwise whatever we don't try to evict will leak from line to line.

In D119784#3323786, @dblaikie wrote:

Perhaps you could commandeer/pick up https://reviews.llvm.org/D90006 ? or summarize in this (or the other, D78950) that historical context/what's new/different/justified in this new review?

In D119784#3324002, @mysterymath wrote:

In D119784#3323786, @dblaikie wrote:

Perhaps you could commandeer/pick up https://reviews.llvm.org/D90006 ? or summarize in this (or the other, D78950) that historical context/what's new/different/justified in this new review?

The context for this change is to try to fix OOMs discovered when symbolizing system logs with llvm-symbolizer.
As is, llvm-symbolizer essentially leaks memory when using it's STDIN format. It's specified in a line-at-a-time fashion, but data is never cleaned up from one line to the next. Eventually, you either have to either stop feeding it lines or restart the job.

It seems like the Symbolizer is the right place to address this; it *could* mmap in and parse the debug binaries anew for each symbolization request, but it doesn't, because it's more efficient to keep this data around. At least, right up until physical memory is exhausted.
Accordingly, it shouldn't really matter what kind of derived data the Symbolizer keeps; everything should be fair game for eviction; otherwise whatever we don't try to evict will leak from line to line.

Does this solution subsume the previous patches? What does it provide that they didn't/why this rather than those directions? (does it not subsume that functionality? (ie: it addresses some issues, but not some/all of the issues the other patches were interested in) if so, it'd be unfortunate to have to solve related issues in two different places/ways - would be good to understand why we can't find a common solution to these issues)

It doesn't look like the memory usage improvements of either patch are a proper subset of each other.

This patch operates file-at-a-time; it ejects all of the debug information for a given file at once. This includes data that isn't covered by the other patch: PDB debug info, the lists and ancillary data structures parsed in the ObjectFiles, and the contents of the binary files themselves (which may not have been mmapped if they were small).
It should completely subsume the other patch with regards to long-term leak protection; everything should be discarded between symbolization requests.

However, since the other patch operates at a finer granularity, it can reduce memory usage even with only one single binary present, as in the example provided in https://reviews.llvm.org/D90006, which showed a reduction when symbolizing clang alone.

If we start with coarse-grained caching, we could replace caching for objects like DWARFContext with finer grained caching as it becomes available.
If we instead start with finer-grained caching on a subset of the data (DWARFContext), it would help reduce the leakiness of llvm-symbolizer, but it kicks the can down the road. I'm not sure exactly how far, though; maybe quite far.

Thanks for summarizing the differences. Be nice if we could do this once/figure out a general implementation that covers both scenarios, but I'm not sure how practical that is.

Do you have some performance numbers? What if we didn't cache/had a cache size of 0? (does the caching logic buy us enough performance to justify it)

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
237
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
546

Do you have some performance numbers? What if we didn't cache/had a cache size of 0? (does the caching logic buy us enough performance to justify it)

I did some poking around, and it looks like restarting the job after every request was tried as a workaround.
This choked on large binaries, and they had to change to restart-every-n requests.

I took a large debug binary, Clang, and symbolized 100 random symbols from it.

Zero-size LRU:

real    4m44.289s
user    4m27.506s
sys     0m16.720s

No LRU, flush after each request:

real    4m41.313s
user    4m24.575s
sys     0m16.666s

Unbounded LRU:

real    0m4.309s
user    0m3.842s
sys     0m0.471s

HEAD:

real    0m4.505s
user    0m3.982s
sys     0m0.526s

Conversely, we can look at how much overhead the LRU order maintenance adds to 100000 very easily cacheable requests (lookup main in int main(void){return 0;}).

Unbounded LRU:

real    0m0.421s
user    0m0.305s
sys     0m0.120s

HEAD:

real    0m0.410s
user    0m0.334s
sys     0m0.079s

Looking at the zero-size LRU, I think we'd probably want cache sizes to be soft.
If a large binary is symbolized, and the binary is over the cap, then it's guaranteed to be evicted from the cache after each request, even though this doesn't lower the peak memory usage of the binary.
Instead, we can inflate the cap to the size of the largest individual binary that has been symbolized; this is the high water mark of memory usage overall.
This would let us pick a relatively low default cap, say 500MiB, without harming performance when symbolizing larger binaries.

Also, I realize that these examples are synthetic, but they should help characterize the edge case behaviors at least. I'll post back with a real example from system logs soon.

If a large binary is symbolized, and the binary is over the cap, then it's guaranteed to be evicted from the cache after each request, even though this doesn't lower the peak memory usage of the binary.

Perhaps the cache eviction could be implemented differently? Only evict if the cache size is exceeded and a new item is about to be added? (that does mean needing to know the size of the new item before it's manifest, though, which I guess is a bunch of work/more code... )

Or would it be OK to evict the last thing even as the next thing is being added to the cache - before all the derived objects are created from the new object? Would that be adequate? (so we could move eviction to the "add item to cache" piece, which would let an item larger than the cache size persist through multiple queries - since eviction wouldn't happen until a cache insertion was performed when the cache already exceeded the cache size?)

In D119784#3334359, @dblaikie wrote:

Or would it be OK to evict the last thing even as the next thing is being added to the cache - before all the derived objects are created from the new object? Would that be adequate?

I think it'd be pretty reasonable to load in the second binary before we evict the first; mapping in two large binaries isn't all that worse than one, especially since there currently isn't any bound at all.
If that's OK, we can get this effect by just always keeping the most recently used binary around. This should be much simpler to code, and it'll only slightly delay evicting the binary to the next pruneCache() call.

In D119784#3339028, @mysterymath wrote:

In D119784#3334359, @dblaikie wrote:

Or would it be OK to evict the last thing even as the next thing is being added to the cache - before all the derived objects are created from the new object? Would that be adequate?

I think it'd be pretty reasonable to load in the second binary before we evict the first; mapping in two large binaries isn't all that worse than one, especially since there currently isn't any bound at all.
If that's OK, we can get this effect by just always keeping the most recently used binary around. This should be much simpler to code, and it'll only slightly delay evicting the binary to the next pruneCache() call.

Hmm, not sure I'm following why that'd be easier to code - is the complexity related to one symbolizer query (the current cache invalidation granularity proposed in this patch, if I understand correctly) maybe needing more than one binary to be loaded, so we can't do any cache invalidation on a finer granularity because of the risk of referenced entities in the invalidated elements in the case of a symbolizer query needing more than one thing kept alive somehow?

Sorry, maybe I'm just a bit brain-fried at the moment and having trouble following something. I /think/ I'm imagining two things, perhaps the one you're proposing already - pruneCache stopping at LRUBinaries.size() == 1 instead of empty, and possibly doing pruneCache in the recordAccess call instead of having a separate call outside in llvm-symbolizer.cpp? I guess that comes back to the point you made in the review that the invalidation has to be in the client because otherwise stuff will be invalidated out from underneath the client's use case.

Yeah, maybe just changing the empty check to the <= 1 test.

>>! In D119784#3339036, @dblaikie wrote:

[...] possibly doing pruneCache in the recordAccess call instead of having a separate call outside in llvm-symbolizer.cpp? I guess that comes back to the point you made in the review that the invalidation has to be in the client because otherwise stuff will be invalidated out from underneath the client's use case.

Yeah, there is actually a safe way to do that, it's a bit hairier. You'd have pruneCache mark the cache as "safe to prune", then have the code that adds a new item to the cache run the prune if that flag were set. If anything was accessed in the mean time, it'd set the safe flag to false, since at that point a memory reference might escape out. I was thinking that setting the flag to false would be another thing you'd have to make sure to do right or else bugs, but now that you mention it, that could go in the recordAccess call.

That should be a backwards compatible change though, so I'm inclined to just whip up the size() != 1 version and come back to this later.

In D119784#3340979, @mysterymath wrote:

>>! In D119784#3339036, @dblaikie wrote:

[...] possibly doing pruneCache in the recordAccess call instead of having a separate call outside in llvm-symbolizer.cpp? I guess that comes back to the point you made in the review that the invalidation has to be in the client because otherwise stuff will be invalidated out from underneath the client's use case.

Yeah, there is actually a safe way to do that, it's a bit hairier. You'd have pruneCache mark the cache as "safe to prune", then have the code that adds a new item to the cache run the prune if that flag were set. If anything was accessed in the mean time, it'd set the safe flag to false, since at that point a memory reference might escape out. I was thinking that setting the flag to false would be another thing you'd have to make sure to do right or else bugs, but now that you mention it, that could go in the recordAccess call.

That should be a backwards compatible change though, so I'm inclined to just whip up the size() != 1 version and come back to this later.

Fair enough - sounds good.

This revision is now accepted and ready to land.Feb 23 2022, 12:05 PM

Always keep the MRU binary in the cache. This keeps it from thrashing on subsequent requests.

Set default cache size to 512 MiB on 32-bit hosts; 4 GiB on 64-bit hosts. This is intended to be a conservative cap, as the current implementation has no limits.

Add a llvm-symbolizer flag to change the cache size.

Here are the results from a real production symbolization workflow, looped a few times:

LRU, Size 0, Evict all:

real    0m3.277s
user    0m3.098s
sys     0m0.183s

LRU, Size 0, Keep MRU:

real    0m0.598s
user    0m0.515s
sys     0m0.087s

LRU, 4 GiB:

real    0m0.208s
user    0m0.123s
sys     0m0.089s

HEAD:

real    0m0.182s
user    0m0.131s
sys     0m0.054s

Remove redundant if(Bin) check.

Use find over operator[]; measures very slightly faster, and it's more
consistent with the surrounding code.

Harbormaster completed remote builds in B151162: Diff 410970.Feb 23 2022, 5:28 PM

This revision was landed with ongoing or failed builds.Feb 24 2022, 4:32 PM

Closed by commit rG02106ec15c2e: [Symbolize] LRU cache binaries in llvm-symbolizer. (authored by mysterymath). · Explain Why

This revision was automatically updated to reflect the committed changes.

mysterymath added a commit: rG02106ec15c2e: [Symbolize] LRU cache binaries in llvm-symbolizer..

Revision Contents

Path

Size

llvm/

include/

llvm/

DebugInfo/

Symbolize/

Symbolize.h

52 lines

lib/

DebugInfo/

Symbolize/

Symbolize.cpp

82 lines

tools/

llvm-symbolizer/

Opts.td

1 line

llvm-symbolizer.cpp

3 lines

Diff 411263

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

//===- Symbolize.h ----------------------------------------------*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// Header for LLVM symbolization library.

//===----------------------------------------------------------------------===//

#ifndef LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H

#define LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H

#include "llvm/ADT/StringMap.h"

#include "llvm/ADT/ilist_node.h"

#include "llvm/ADT/simple_ilist.h"

#include "llvm/DebugInfo/DIContext.h"

#include "llvm/DebugInfo/Symbolize/DIFetcher.h"

#include "llvm/Object/Binary.h"

#include "llvm/Support/Error.h"

#include <algorithm>

#include <cstdint>

#include <list>

#include <map>

#include <memory>

#include <string>

#include <utility>

#include <vector>

namespace llvm {

namespace object {

class ELFObjectFileBase;

class MachOObjectFile;

class ObjectFile;

struct SectionedAddress;

} // namespace object

namespace symbolize {

class SymbolizableModule;

using namespace object;

using FunctionNameKind = DILineInfoSpecifier::FunctionNameKind;

using FileLineInfoKind = DILineInfoSpecifier::FileLineInfoKind;

class CachedBinary;

class LLVMSymbolizer {

public:

struct Options {

FunctionNameKind PrintFunctions = FunctionNameKind::LinkageName;

FileLineInfoKind PathStyle = FileLineInfoKind::AbsoluteFilePath;

bool UseSymbolTable = true;

bool Demangle = true;

bool RelativeAddresses = false;

bool UntagAddresses = false;

bool UseDIA = false;

std::string DefaultArch;

std::vector<std::string> DsymHints;

std::string FallbackDebugPath;

std::string DWPName;

std::vector<std::string> DebugFileDirectory;

size_t MaxCacheSize = sizeof(size_t) == 4

? 512 * 1024 * 1024 /* 512 MiB */

: 4ULL * 1024 * 1024 * 1024 /* 4 GiB */;

};

LLVMSymbolizer() = default;

LLVMSymbolizer(const Options &Opts);

~LLVMSymbolizer();

// Overloads accepting ObjectFile does not support COFF currently

Show All 24 Lines

public:

Expected<std::vector<DILocal>>

symbolizeFrame(const std::string &ModuleName,

object::SectionedAddress ModuleOffset);

Expected<std::vector<DILocal>>

symbolizeFrame(ArrayRef<uint8_t> BuildID,

object::SectionedAddress ModuleOffset);

void flush();

// Evict entries from the binary cache until it is under the maximum size

// given in the options. Calling this invalidates references in the DI...

// objects returned by the methods above.

void pruneCache();

static std::string

DemangleName(const std::string &Name,

const SymbolizableModule *DbiModuleDescriptor);

void addDIFetcher(std::unique_ptr<DIFetcher> Fetcher) {

DIFetchers.push_back(std::move(Fetcher));

}

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

Expected<ObjectPair> getOrCreateObjectPair(const std::string &Path,

const std::string &ArchName);

/// Return a pointer to object file at specified path, for a specified

/// architecture (e.g. if path refers to a Mach-O universal binary, only one

/// object file from it will be returned).

Expected<ObjectFile *> getOrCreateObject(const std::string &Path,

const std::string &ArchName);

/// Update the LRU cache order when a binary is accessed.

void recordAccess(CachedBinary &Bin);

std::map<std::string, std::unique_ptr<SymbolizableModule>, std::less<>>

Modules;

StringMap<std::string> BuildIDPaths;

/// Contains cached results of getOrCreateObjectPair().

std::map<std::pair<std::string, std::string>, ObjectPair>

ObjectPairForPathArch;

/// Contains parsed binary for each path, or parsing error.

std::map<std::string, OwningBinary<Binary>> BinaryForPath;

std::map<std::string, CachedBinary> BinaryForPath;

/// A list of cached binaries in LRU order.

simple_ilist<CachedBinary> LRUBinaries;

/// Sum of the sizes of the cached binaries.

size_t CacheSize = 0;

/// Parsed object file for path/architecture pair, where "path" refers

/// to Mach-O universal binary.

std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>

ObjectForUBPathAndArch;

Options Opts;

SmallVector<std::unique_ptr<DIFetcher>> DIFetchers;

};

// A binary intrusively linked into a LRU cache list. If the binary is empty,

// then the entry marks that an error occurred, and it is not part of the LRU

// list.

class CachedBinary : public ilist_node<CachedBinary> {

public:

CachedBinary() = default;

CachedBinary(OwningBinary<Binary> Bin) : Bin(std::move(Bin)) {}

OwningBinary<Binary> &operator*() { return Bin; }

OwningBinary<Binary> *operator->() { return &Bin; }

// Add an action to be performed when the binary is evicted, before all

// previously registered evictors.

void pushEvictor(std::function<void()> Evictor);

// Run all registered evictors in the reverse of the order in which they were

// added.

void evict() {

if (Evictor)

Evictor();

}

dblaikieUnsubmitted

Done

Evictor();

- };

+ }

size_t size() { return Bin.getBinary()->getData().size(); }

dblaikie:

size_t size() { return Bin.getBinary()->getData().size(); }

private:

OwningBinary<Binary> Bin;

std::function<void()> Evictor;

};

} // end namespace symbolize

} // end namespace llvm

#endif // LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines

Expected<std::vector<DILocal>>

LLVMSymbolizer::symbolizeFrame(ArrayRef<uint8_t> BuildID,

object::SectionedAddress ModuleOffset) {

return symbolizeFrameCommon(BuildID, ModuleOffset);

}

void LLVMSymbolizer::flush() {

ObjectForUBPathAndArch.clear();

LRUBinaries.clear();

CacheSize = 0;

BinaryForPath.clear();

ObjectPairForPathArch.clear();

Modules.clear();

BuildIDPaths.clear();

}

namespace {

▲ Show 20 Lines • Show All 250 Lines • ▼ Show 20 Lines

bool LLVMSymbolizer::getOrFindDebugBinary(const ArrayRef<uint8_t> BuildID,

return false;

}

Expected<LLVMSymbolizer::ObjectPair>

LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,

const std::string &ArchName) {

auto I = ObjectPairForPathArch.find(std::make_pair(Path, ArchName));

if (I != ObjectPairForPathArch.end())

if (I != ObjectPairForPathArch.end()) {

recordAccess(BinaryForPath.find(Path)->second);

return I->second;

}

auto ObjOrErr = getOrCreateObject(Path, ArchName);

if (!ObjOrErr) {

ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName),

ObjectPair(nullptr, nullptr));

return ObjOrErr.takeError();

}

ObjectFile *Obj = ObjOrErr.get();

assert(Obj != nullptr);

ObjectFile *DbgObj = nullptr;

if (auto MachObj = dyn_cast<const MachOObjectFile>(Obj))

DbgObj = lookUpDsymFile(Path, MachObj, ArchName);

else if (auto ELFObj = dyn_cast<const ELFObjectFileBase>(Obj))

DbgObj = lookUpBuildIDObject(Path, ELFObj, ArchName);

if (!DbgObj)

DbgObj = lookUpDebuglinkObject(Path, Obj, ArchName);

if (!DbgObj)

DbgObj = Obj;

ObjectPair Res = std::make_pair(Obj, DbgObj);

std::string DbgObjPath = DbgObj->getFileName().str();

auto Pair =

ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);

BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {

ObjectPairForPathArch.erase(I);

});

return Res;

}

Expected<ObjectFile *>

LLVMSymbolizer::getOrCreateObject(const std::string &Path,

const std::string &ArchName) {

Binary *Bin;

auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());

if (!Pair.second) {

Bin = Pair.first->second.getBinary();

Bin = Pair.first->second->getBinary();

recordAccess(Pair.first->second);

} else {

Expected<OwningBinary<Binary>> BinOrErr = createBinary(Path);

if (!BinOrErr)

return BinOrErr.takeError();

Pair.first->second = std::move(BinOrErr.get());

Bin = Pair.first->second.getBinary();

CachedBinary &CachedBin = Pair.first->second;

CachedBin = std::move(BinOrErr.get());

dblaikieUnsubmitted

Not Done

CachedBinary &CachedBin = Pair.first->second;

- CachedBin = CachedBinary(std::move(BinOrErr.get()));

+ CachedBin = std::move(BinOrErr.get());

CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });

dblaikie:

CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });

LRUBinaries.push_back(CachedBin);

CacheSize += CachedBin.size();

Bin = CachedBin->getBinary();

}

if (!Bin)

return static_cast<ObjectFile *>(nullptr);

if (MachOUniversalBinary *UB = dyn_cast_or_null<MachOUniversalBinary>(Bin)) {

auto I = ObjectForUBPathAndArch.find(std::make_pair(Path, ArchName));

if (I != ObjectForUBPathAndArch.end())

return I->second.get();

Expected<std::unique_ptr<ObjectFile>> ObjOrErr =

UB->getMachOObjectForArch(ArchName);

if (!ObjOrErr) {

ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),

std::unique_ptr<ObjectFile>());

return ObjOrErr.takeError();

}

ObjectFile *Res = ObjOrErr->get();

ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),

auto Pair = ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),

std::move(ObjOrErr.get()));

BinaryForPath.find(Path)->second.pushEvictor(

[this, Iter = Pair.first]() { ObjectForUBPathAndArch.erase(Iter); });

return Res;

}

if (Bin->isObject()) {

return cast<ObjectFile>(Bin);

}

return errorCodeToError(object_error::arch_not_found);

}

Show All 11 Lines

LLVMSymbolizer::createModuleInfo(const ObjectFile *Obj,

assert(InsertResult.second);

if (!InfoOrErr)

return InfoOrErr.takeError();

return InsertResult.first->second.get();

}

Expected<SymbolizableModule *>

LLVMSymbolizer::getOrCreateModuleInfo(const std::string &ModuleName) {

auto I = Modules.find(ModuleName);

if (I != Modules.end())

return I->second.get();

std::string BinaryName = ModuleName;

std::string ArchName = Opts.DefaultArch;

size_t ColonPos = ModuleName.find_last_of(':');

// Verify that substring after colon form a valid arch name.

if (ColonPos != std::string::npos) {

std::string ArchStr = ModuleName.substr(ColonPos + 1);

if (Triple(ArchStr).getArch() != Triple::UnknownArch) {

BinaryName = ModuleName.substr(0, ColonPos);

ArchName = ArchStr;

}

auto I = Modules.find(ModuleName);

if (I != Modules.end()) {

recordAccess(BinaryForPath.find(BinaryName)->second);

return I->second.get();

}

auto ObjectsOrErr = getOrCreateObjectPair(BinaryName, ArchName);

if (!ObjectsOrErr) {

// Failed to find valid object file.

Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());

return ObjectsOrErr.takeError();

}

ObjectPair Objects = ObjectsOrErr.get();

Show All 18 Lines

if (!EC && DebugInfo != nullptr && !PDBFileName.empty()) {

}

Context.reset(new PDBContext(*CoffObject, std::move(Session)));

}

if (!Context)

Context = DWARFContext::create(

*Objects.second, DWARFContext::ProcessDebugRelocations::Process,

nullptr, Opts.DWPName);

return createModuleInfo(Objects.first, std::move(Context), ModuleName);

auto ModuleOrErr =

createModuleInfo(Objects.first, std::move(Context), ModuleName);

if (ModuleOrErr) {

auto I = Modules.find(ModuleName);

BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {

Modules.erase(I);

});

}

return ModuleOrErr;

}

Expected<SymbolizableModule *>

LLVMSymbolizer::getOrCreateModuleInfo(const ObjectFile &Obj) {

StringRef ObjName = Obj.getFileName();

auto I = Modules.find(ObjName);

if (I != Modules.end())

return I->second.get();

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

if (!Name.empty() && Name.front() == '?') {

return Result;

}

if (DbiModuleDescriptor && DbiModuleDescriptor->isWin32Module())

return std::string(demanglePE32ExternCFunc(Name));

return Name;

}

void LLVMSymbolizer::recordAccess(CachedBinary &Bin) {

if (Bin->getBinary())

LRUBinaries.splice(LRUBinaries.end(), LRUBinaries, Bin.getIterator());

}

void LLVMSymbolizer::pruneCache() {

// Evict the LRU binary until the max cache size is reached or there's <= 1

// item in the cache. The MRU binary is always kept to avoid thrashing if it's

// larger than the cache size.

while (CacheSize > Opts.MaxCacheSize && !LRUBinaries.empty() &&

std::next(LRUBinaries.begin()) != LRUBinaries.end()) {

CachedBinary &Bin = LRUBinaries.front();

CacheSize -= Bin.size();

LRUBinaries.pop_front();

Bin.evict();

}

void CachedBinary::pushEvictor(std::function<void()> NewEvictor) {

if (Evictor) {

this->Evictor = [OldEvictor = std::move(this->Evictor),

NewEvictor = std::move(NewEvictor)]() {

NewEvictor();

OldEvictor();

};

} else {

this->Evictor = std::move(NewEvictor);

}

} // namespace symbolize

} // namespace llvm

llvm/tools/llvm-symbolizer/Opts.td

Show All 16 Lines	def grp_mach_o : OptionGroup<"kind">,
HelpText<"llvm-symbolizer Mach-O Specific Options">;		HelpText<"llvm-symbolizer Mach-O Specific Options">;

def addresses : F<"addresses", "Show address before line information">;		def addresses : F<"addresses", "Show address before line information">;
defm adjust_vma		defm adjust_vma
: Eq<"adjust-vma", "Add specified offset to object file addresses">,		: Eq<"adjust-vma", "Add specified offset to object file addresses">,
MetaVarName<"<offset>">;		MetaVarName<"<offset>">;
def basenames : Flag<["--"], "basenames">, HelpText<"Strip directory names from paths">;		def basenames : Flag<["--"], "basenames">, HelpText<"Strip directory names from paths">;
defm build_id : Eq<"build-id", "Build ID used to look up the object file">;		defm build_id : Eq<"build-id", "Build ID used to look up the object file">;
		defm cache_size : Eq<"cache-size", "Max size in bytes of the in-memory binary cache.">;
defm debug_file_directory : Eq<"debug-file-directory", "Path to directory where to look for debug files">, MetaVarName<"<dir>">;		defm debug_file_directory : Eq<"debug-file-directory", "Path to directory where to look for debug files">, MetaVarName<"<dir>">;
defm debuginfod : B<"debuginfod", "Use debuginfod to find debug binaries", "Don't use debuginfod to find debug binaries">;		defm debuginfod : B<"debuginfod", "Use debuginfod to find debug binaries", "Don't use debuginfod to find debug binaries">;
defm default_arch		defm default_arch
: Eq<"default-arch", "Default architecture (for multi-arch objects)">,		: Eq<"default-arch", "Default architecture (for multi-arch objects)">,
Group<grp_mach_o>;		Group<grp_mach_o>;
defm demangle : B<"demangle", "Demangle function names", "Don't demangle function names">;		defm demangle : B<"demangle", "Demangle function names", "Don't demangle function names">;
def functions : F<"functions", "Print function name for a given address">;		def functions : F<"functions", "Print function name for a given address">;
def functions_EQ : Joined<["--"], "functions=">, HelpText<"Print function name for a given address">, Values<"none,short,linkage">;		def functions_EQ : Joined<["--"], "functions=">, HelpText<"Print function name for a given address">, Values<"none,short,linkage">;
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	Expected<DILineInfo> Res0OrErr =
: ((ResOrErr->getNumberOfFrames() == 0) ? DILineInfo()		: ((ResOrErr->getNumberOfFrames() == 0) ? DILineInfo()
: ResOrErr->getFrame(0));		: ResOrErr->getFrame(0));
print({ModuleName, Offset}, Res0OrErr, Printer);		print({ModuleName, Offset}, Res0OrErr, Printer);
} else {		} else {
Expected<DILineInfo> ResOrErr =		Expected<DILineInfo> ResOrErr =
Symbolizer.symbolizeCode(ModuleSpec, Address);		Symbolizer.symbolizeCode(ModuleSpec, Address);
print({ModuleName, Offset}, ResOrErr, Printer);		print({ModuleName, Offset}, ResOrErr, Printer);
}		}
		Symbolizer.pruneCache();
}		}

static void symbolizeInput(const opt::InputArgList &Args,		static void symbolizeInput(const opt::InputArgList &Args,
ArrayRef<uint8_t> BuildID, uint64_t AdjustVMA,		ArrayRef<uint8_t> BuildID, uint64_t AdjustVMA,
bool IsAddr2Line, OutputStyle Style,		bool IsAddr2Line, OutputStyle Style,
StringRef InputString, LLVMSymbolizer &Symbolizer,		StringRef InputString, LLVMSymbolizer &Symbolizer,
DIPrinter &Printer) {		DIPrinter &Printer) {
Command Cmd;		Command Cmd;
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
Opts.UseDIA = Args.hasArg(OPT_use_dia);		Opts.UseDIA = Args.hasArg(OPT_use_dia);
#if !defined(LLVM_ENABLE_DIA_SDK)		#if !defined(LLVM_ENABLE_DIA_SDK)
if (Opts.UseDIA) {		if (Opts.UseDIA) {
WithColor::warning() << "DIA not available; using native PDB reader\n";		WithColor::warning() << "DIA not available; using native PDB reader\n";
Opts.UseDIA = false;		Opts.UseDIA = false;
}		}
#endif		#endif
Opts.UseSymbolTable = true;		Opts.UseSymbolTable = true;
		if (Args.hasArg(OPT_cache_size_EQ))
		parseIntArg(Args, OPT_cache_size_EQ, Opts.MaxCacheSize);
Config.PrintAddress = Args.hasArg(OPT_addresses);		Config.PrintAddress = Args.hasArg(OPT_addresses);
Config.PrintFunctions = Opts.PrintFunctions != FunctionNameKind::None;		Config.PrintFunctions = Opts.PrintFunctions != FunctionNameKind::None;
Config.Pretty = Args.hasArg(OPT_pretty_print);		Config.Pretty = Args.hasArg(OPT_pretty_print);
Config.Verbose = Args.hasArg(OPT_verbose);		Config.Verbose = Args.hasArg(OPT_verbose);

for (const opt::Arg *A : Args.filtered(OPT_dsym_hint_EQ)) {		for (const opt::Arg *A : Args.filtered(OPT_dsym_hint_EQ)) {
StringRef Hint(A->getValue());		StringRef Hint(A->getValue());
if (sys::path::extension(Hint) == ".dSYM") {		if (sys::path::extension(Hint) == ".dSYM") {
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Symbolize] LRU cache binaries in llvm-symbolizer.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 411263

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/tools/llvm-symbolizer/Opts.td

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

[Symbolize] LRU cache binaries in llvm-symbolizer.
ClosedPublic