This is an archive of the discontinued LLVM Phabricator instance.

Add the ability to cache information for a module between debugger instances.
Needs RevisionPublic

Authored by clayborg on Nov 12 2021, 11:54 AM.

Details

Summary

Added the ability to cache the finalized symbol tables as a proof of concept and as a way to discuss caching items for a given module to allow subsequent debug sessions to start faster.

Module caching must be enabled by the user before this can be used:

(lldb) settings set symbols.enable-lldb-modules-cache true

There is also a setting that allows the user to specify a module cache directory that defaults to a directory that defaults to being next to the symbols.clang-modules-cache-path directory in a temp directory:

(lldb) settings show symbols.lldb-modules-cache-path

If this setting is enabled, the finalized symbol tables will be serialized and saved to disc so they can be quickly loaded next time you debug.

Each module gets a cache directory in the "symbols.lldb-modules-cache-path" directory. A directory is created in the cache directory that uses the module's basename as a directory and then a hash is created for the module which incorporates the full path to the executable, the architecture, the object name offset and modification time (if any). This allows universal mach-o files to support caching multuple architectures in the same module cache directory. Making the directory based on the this info allows this directory's contents to be deleted and replaced when the file gets updated on disk. This keeps the cache from growing over time during the compile/edit/debug cycle and prevents out of space issues.

If the cache is enabled, the symbol table will be loaded from the cache the next time you debug if the module has not changed.

Diff Detail

Event Timeline

clayborg created this revision.Nov 12 2021, 11:54 AM
clayborg requested review of this revision.Nov 12 2021, 11:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2021, 11:54 AM

This patch effectively introduces a file format to cache lldb internal state. While the tests and the code do give some information for what it looks like, some documentation about this format would be nice.

I like this idea a lot! My comments were kind of surface level but should help fix some of the rough edges of the patch.

lldb/include/lldb/Core/Module.h
879

nit: unnecessary whitespace change

951

typo: indentify -> identify

959

have a object name -> have an object name

lldb/include/lldb/Symbol/ObjectFile.h
710–713

Curious to know, why lldb::addr_t instead of something like lldb::offset_t?

lldb/include/lldb/Symbol/Symbol.h
21–22

nit: you have a namespace comment when you do this in an earlier file but not here. what does clang-format do?

lldb/source/Core/Mangled.cpp
411–412

What does "two different names that are not related" mean here?

lldb/source/Core/Module.cpp
65

nit: unrelated whitespace change

1706

typo: valie -> valid

lldb/source/Host/common/FileSystem.cpp
134

Could you break this line into multiple lines?

clayborg updated this revision to Diff 386972.Nov 12 2021, 4:00 PM

Added more comments to the encoding functions to detail how things are encoded.
Fixed error typos.
Added version to symbol table cache file so we can improve the format as time goes on and avoid loading out of data caches.

clayborg marked 4 inline comments as done.Nov 12 2021, 4:01 PM
clayborg added inline comments.
lldb/include/lldb/Symbol/ObjectFile.h
713

Because it can be an address in memory as well for object file that are read from memory only with no backing object file on disk

lldb/include/lldb/Symbol/Symbol.h
22

clang format didn't catch this, as I have it enabled.

lldb/source/Core/Mangled.cpp
412

I will improve the comment on this to explain!

wallace requested changes to this revision.Nov 12 2021, 10:04 PM

There are two things that could be added. One is already mentioned in the comments, which is logging error messages in a way that we can track bugs in an automated fashion, as right now the errors are being thrown away or transformed into simple true/false values that lose the information of the actual issue. Now that we have a rich telemetry in place, we should use llvm::Error/Expected as much as possible and log the errors in both a log channel and the target stats. The second item to discuss is that there should be a configurable maximum size of the entire cache folder (e.g. 70GB), so that, for example, when lldb initializes it cleans up the oldest items that make the cache blow up. I can also add telemetry from the IDE side to track the size of the total cache folder, but I imagine that if we don't put a limiter, we might cause some issues.

lldb/include/lldb/Core/Mangled.h
24–28

being a little bit strict, you should move the FileWriter class out of gsym. That will also make it more generic and compelling to use for other stuff

lldb/include/lldb/Host/FileSystem.h
93

this should return an llvm::Error and the caller should decide whether to consume the error or not.

167–168

Rename this to RemoveFile to avoid any ambiguities

lldb/source/Core/CoreProperties.td
23

mention here what the default directory ends up being used if this setting is left unchanged

lldb/source/Core/Module.cpp
1669–1676

i'd prefer if there's a separator character like ; between each element just to avoid any future possible bugs

1680

this function is doing many things and reading is harder. Besides splitting it, we should log in the target stats if the cache was enabled or not, if the cache was used or not in a per module basis and any errors associated. This will help us fix issues without having to wait for users' reports.
you should also print the error in a log channel

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2723

log here in the target stats if the cache was effectively used or not

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
602

same here about telemetry

This revision now requires changes to proceed.Nov 12 2021, 10:04 PM

Maybe it's good for starting a discussion, but I am surprised that we would need to cache symtab information. This is a fairly simple format that needs to be read at application runtime as well, so I am surprised that reading it from the cache is substantially faster than doing it from the object file. What is the slow part there? Is it demangling by any chance?

Now to the discussion. Overall, I think this is an interesting feature, but I do have some remarks/questions:

  • I don't think that "lldb module cache" is very good name for this feature as we already have the "platform module cache" feature (see Target/ModuleCache.h and its callers), which deals with downloading (and caching) modules (object files) from remote systems. And then there's the clang module cache, of course.. (and the llvm modules, but we fortunately don't cache those). It would be better if we chose a slightly less overloaded name. Maybe "index cache" ? (I assume you will also want to store dwarf indexes there)
  • I don't see anything which would ensure cache consistency for the case where multiple debugger instances try to cache the same module simultaneously. What's the plan for that?
  • Have you considered the building upon the caching infrastructure in llvm (llvm/Caching.h, llvm/CachePruning.h)? I believe it already has some size limiting features and multiprocess synchronization built-in, so using it would take care of the previous item, and of @wallace's size limit request.
  • it's moderately strange to be using the gsym file writer for this purpose

I haven't looked at the code in detail, but I noted that the filesystem changes, and some of the unrelated formatting cleanups could be put into separate patches.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2722

Can we make it so that we don't need to add these bits to every object file implementation? Maybe create a protected CreateSymtab function, which would get only called if the Symtab really needs to be created?

lldb/source/Symbol/Symbol.cpp
645

It would be (much) better to put the bitfields in an (anonymous) union that can be accessed both as a uint16_t and as individual bitfields.

Maybe it's good for starting a discussion, but I am surprised that we would need to cache symtab information. This is a fairly simple format that needs to be read at application runtime as well, so I am surprised that reading it from the cache is substantially faster than doing it from the object file. What is the slow part there? Is it demangling by any chance?

Symbol tables for LLDB are made up by scouring every bit of data we can out of one or more symbol tables (ELF has normal symbol table and the dynamic symbol table, and optionally the symbol table in the zipped debug info). There are also many runtime sections like .eh_frame and others that can give us the start address of functions. If a binary is stripped, then we might not have a symbol for local functions, but many runtime sections have information on where functions start.

So while the symbol table seems simple, there is a ton of work that goes into creating these. Some things that are expensive:

  • parse all symbol tables and unique the results
  • reduce duplicate symbols as we parse one or more symbol tables (this happens a lot in mach-o files)
  • keep a map of symbols that have been emitted so we can skip creating symbols from runtime information
  • mach-o files have many symbols that describe the same thing: normal symbols and debug map symbols, so these symbols need to be coalesced into one symbol so we don't end up with multiple symbols with the same name and different info

I will post some performance results on loading a large Mac debug session as soon as my release build finishes. I was getting some really good performance improvements when I first started this patch as I tried this out before taking the time to try and finalize this patch.

Now to the discussion. Overall, I think this is an interesting feature, but I do have some remarks/questions:

  • I don't think that "lldb module cache" is very good name for this feature as we already have the "platform module cache" feature (see Target/ModuleCache.h and its callers), which deals with downloading (and caching) modules (object files) from remote systems. And then there's the clang module cache, of course.. (and the llvm modules, but we fortunately don't cache those). It would be better if we chose a slightly less overloaded name. Maybe "index cache" ? (I assume you will also want to store dwarf indexes there)

yes debug info is the next item I want to put into this cache. "index cache" sounds fine. I will rename once I get a few NFC patches submitted and rebase.

  • I don't see anything which would ensure cache consistency for the case where multiple debugger instances try to cache the same module simultaneously. What's the plan for that?

I was hoping to learn LLVM already had this kind of thing and using that! I see below there are some things already in LLVM I should be able to use.

  • Have you considered the building upon the caching infrastructure in llvm (llvm/Caching.h, llvm/CachePruning.h)? I believe it already has some size limiting features and multiprocess synchronization built-in, so using it would take care of the previous item, and of @wallace's size limit request.

I will check into that and see if I can use that! I figured it would have been done already, glad to see it has!

  • it's moderately strange to be using the gsym file writer for this purpose

When I made gsym I create the FileWriter as something that can easily create binary content without having to go with the ASM print/writer stuff in LLVM as that requires tons of stuff including LLVM targets and object file support.

What suggestions do people have on this front? Should I make a patch to move FileWriter out of GSYM? I don't know how if llvm::gym::FileWriter is useful enough for other folks in LLVM. Any thoughts? Should we just make our own FileWriter class that only does what we need for the caching? Is there any other tech in LLVM that does this kind of thing without having to use the full ARM printer/writer stuff?

I haven't looked at the code in detail, but I noted that the filesystem changes, and some of the unrelated formatting cleanups could be put into separate patches.

I can break those out into separate patches.

So my plan is to:

  • make a separate patch for the symbol table parsing code in Module/ObjectFile
  • make a separate patch for FileSystem stuff
  • figure out what to do with llvm::gsym::FileWriter (input would be appreciated here!)
  • rebase this patch on all above patches and try using the llvm caching libraries
lldb/include/lldb/Host/FileSystem.h
93

I can return a Status as like many other calls to be more consistent with the functions in this directory.

lldb/source/Core/Module.cpp
1676

Sounds good.

1680

Yes, stats should log if the cache was enabled, but each module should also emit stats for anything that was loaded from the cache. If the cache is enabled and nothing is in the cache yet, then we should emit stats for anything that was loaded from the cache.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2722

Yeah, I was thinking about this as well as I was doing it. I will make a patch that does this as a NFC patch first and then rebase this patch on that.

2723

Yeah, each object file should remember what/if it loads anything from a cache and emit stats. This patch was already getting too big, and I was planning on doing a quick follow up patch.

lldb/source/Symbol/Symbol.cpp
645

Agreed.

Thinking about this a bit more after Pavel's comment on if performance was improved due to mangling: we don't currently get any perf improvement from mangling/demangling. We might be able to get better performance out of this as well if we _do_ cache the "mangled/demangled" counterparts (save both strings when they are related instead of just saving the mangled string like I do in this patch) when we serialize the Mangled objects. When the objects are deserialized, they could read both strings and then no demangling would need to happen for that mangled name since it could be registers as the counterpart strings during deserialization...

Some perf numbers that involve loading a large iOS Application and all of its shared libraries without caching first and then with caching are found below. The "lldb-perf.py" command will start a timer, run any LLDB commands and then stop the timer and print out. Before the first run, I run "sudo purge" to purge the kernel file cache, and then I run one "cold" run with the file caches being empty. Then I run the same thing again with hot file caches. So the first run is always slower.

First run with no cache enabled:

$ lldb-perf.py --lldb /Users/gclayton/Documents/src/llvm/main-clean/Release/bin/lldb --command 'file Facebook.app'
------------------------------------------------------------------------
Cold file cache run:
------------------------------------------------------------------------
(lldb) script import time; start_time = time.perf_counter()
(lldb) file Facebook.app
Current executable set to 'Facebook.app' (x86_64).
(lldb) script print(time.perf_counter() - start_time)
10.961282056
(lldb) quit
------------------------------------------------------------------------
Warm file cache run:
------------------------------------------------------------------------
(lldb) script import time; start_time = time.perf_counter()
(lldb) file Facebook.app
Current executable set to 'Facebook.app' (x86_64).
(lldb) script print(time.perf_counter() - start_time)
6.4026089829999995
(lldb) quit
------------------------------------------------------------------------
Warm file cache run:
------------------------------------------------------------------------
(lldb) script import time; start_time = time.perf_counter()
(lldb) file Facebook.app
Current executable set to 'Facebook.app' (x86_64).
(lldb) script print(time.perf_counter() - start_time)
6.43446597
(lldb) quit

Second run with caching enabled:

$ lldb-perf.py --lldb /Users/gclayton/Documents/src/llvm/main-clean/Release/bin/lldb --command 'settings set symbols.enable-lldb-modules-cache true' --command 'file Facebook.app'
------------------------------------------------------------------------
Cold file cache run:
------------------------------------------------------------------------
(lldb) script import time; start_time = time.perf_counter()
(lldb) settings set symbols.enable-lldb-modules-cache true
(lldb) file Facebook.app
Current executable set to 'Facebook.app' (x86_64).
(lldb) script print(time.perf_counter() - start_time)
9.102131989
(lldb) quit
------------------------------------------------------------------------
Warm file cache run:
------------------------------------------------------------------------
(lldb) script import time; start_time = time.perf_counter()
(lldb) settings set symbols.enable-lldb-modules-cache true
(lldb) file Facebook.app
Current executable set to 'Facebook.app' (x86_64).
(lldb) script print(time.perf_counter() - start_time)
4.374525497
(lldb) quit
------------------------------------------------------------------------
Warm file cache run:
------------------------------------------------------------------------
(lldb) script import time; start_time = time.perf_counter()
(lldb) settings set symbols.enable-lldb-modules-cache true
(lldb) file Facebook.app
Current executable set to 'Facebook.app' (x86_64).
(lldb) script print(time.perf_counter() - start_time)
4.353001629
(lldb) quit

Results from one large project:
20% faster for cold file caches
47% faster for warm file caches

Maybe it's good for starting a discussion, but I am surprised that we would need to cache symtab information. This is a fairly simple format that needs to be read at application runtime as well, so I am surprised that reading it from the cache is substantially faster than doing it from the object file. What is the slow part there? Is it demangling by any chance?

Symbol tables for LLDB are made up by scouring every bit of data we can out of one or more symbol tables (ELF has normal symbol table and the dynamic symbol table, and optionally the symbol table in the zipped debug info). There are also many runtime sections like .eh_frame and [...]

Ok, you've convinced me. :) I completely forgot about all the things that go into these.

Now to the discussion. Overall, I think this is an interesting feature, but I do have some remarks/questions:

  • I don't think that "lldb module cache" is very good name for this feature as we already have the "platform module cache" feature (see Target/ModuleCache.h and its callers), which deals with downloading (and caching) modules (object files) from remote systems. And then there's the clang module cache, of course.. (and the llvm modules, but we fortunately don't cache those). It would be better if we chose a slightly less overloaded name. Maybe "index cache" ? (I assume you will also want to store dwarf indexes there)

yes debug info is the next item I want to put into this cache. "index cache" sounds fine. I will rename once I get a few NFC patches submitted and rebase.

  • I don't see anything which would ensure cache consistency for the case where multiple debugger instances try to cache the same module simultaneously. What's the plan for that?

I was hoping to learn LLVM already had this kind of thing and using that! I see below there are some things already in LLVM I should be able to use.

  • Have you considered the building upon the caching infrastructure in llvm (llvm/Caching.h, llvm/CachePruning.h)? I believe it already has some size limiting features and multiprocess synchronization built-in, so using it would take care of the previous item, and of @wallace's size limit request.

I will check into that and see if I can use that! I figured it would have been done already, glad to see it has!

Cool. Thanks.

  • it's moderately strange to be using the gsym file writer for this purpose

When I made gsym I create the FileWriter as something that can easily create binary content without having to go with the ASM print/writer stuff in LLVM as that requires tons of stuff including LLVM targets and object file support.

What suggestions do people have on this front? Should I make a patch to move FileWriter out of GSYM? I don't know how if llvm::gym::FileWriter is useful enough for other folks in LLVM. Any thoughts? Should we just make our own FileWriter class that only does what we need for the caching? Is there any other tech in LLVM that does this kind of thing without having to use the full ARM printer/writer stuff?

The binary writing capabilities in llvm mostly take form of various free functions instead of a single rule-them-all class. This makes it hard to discover, which is probably the reason for the existence of gsym::FileWriter-type classes (the ObjectYAML library also has things like that.) For binary endian-specific writing, there's llvm::endian::Writer in llvm/Support/EndianStream.h. It doesn't have anything fancy, but I don't think we need anything fancy here. The biggest thing it lacks is the ability to write a null-terminated string. We could make another free function for that, or maybe even add it to the raw_ostream interface (next to write_zeroes). I don't think you need them here, but there are [US]LEB128 writing functions in llvm/Support/LEB128.h.

We also have a DataEncoder class on the lldb side. It has (or could be made to have) all the writing capabilities you need, but it's not an exact fit because it's currently writing to a memory buffer. However, it only has a handful of uses, so I could imagine rafactoring it to work with raw_ostreams instead.

lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py