Page MenuHomePhabricator

Introduce new symbol on-demand for debug info
ClosedPublic

Authored by yinghuitan on Mar 14 2022, 11:41 AM.

Details

Summary

This diff introduces a new symbol on-demand which skips
loading a module's debug info unless explicitly asked on
demand. This provides significant performance improvement
for application with dynamic linking mode which has large
number of modules.
The feature can be turned on with:
"settings set symbols.load-on-demand true"

The feature works by creating a new SymbolFileOnDemand class for
each module which wraps the actual SymbolFIle subclass as member
variable. By default, most virtual methods on SymbolFileOnDemand are
skipped so that it looks like there is no debug info for that module.
But once the module's debug info is explicitly requested to
be enabled (in the conditions mentioned below) SymbolFileOnDemand
will allow all methods to pass through and forward to the actual SymbolFile
which would hydrate module's debug info on-demand.

In an internal benchmark, we are seeing more than 95% improvement
for a 3000 modules application.

Currently we are providing several ways to on demand hydrate
a module's debug info:

  • Source line breakpoint: matching in supported files
  • Stack trace: resolving symbol context for an address
  • Symbolic breakpoint: symbol table match guided promotion
  • Global variable: symbol table match guided promotion

In all above situations the module's debug info will be on-demand
parsed and indexed.

Some follow-ups for this feature:

  • Add a command that allows users to load debug info explicitly while using a new or existing command when this feature is enabled
  • Add settings for "never load any of these executables in Symbols On Demand" that takes a list of globs
  • Add settings for "always load the the debug info for executables in Symbols On Demand" that takes a list of globs
  • Add a new column in "image list" that shows up by default when Symbols On Demand is enable to show the status for each shlib like "not enabled for this", "debug info off" and "debug info on" (with a single character to short string, not the ones I just typed)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
clayborg added inline comments.Mar 16 2022, 1:36 PM
lldb/source/Target/Statistics.cpp
272

Other "total" key value pairs use the same string as the ones from the module level to help match them up.

273

Other "total" key value pairs use the same string as the ones from the module level to help match them up.

Re @labath

How many tests/bugs are we talking about here? Were there more than say ten distinct bugs caught there?

It found 2~3 bugs, like source regex breakpoint (breakpoint set -p "source"), regex breakpoint filter by language (breakpoint set -r <regex> -L c++) etc.

running tests to new features vs find /regressions

Running all tests offline once to discover bugs is a good exericse. Since failing tests are understood we can put them into new tests with on-demand explicitly enabled then ondemand test category is unnecessary. This sounds good to me if Greg agrees.

Please don't take this personally

None taken :-)

yinghuitan marked 19 inline comments as done.Mar 17 2022, 2:50 PM
yinghuitan added inline comments.
lldb/source/Symbol/SymbolFileOnDemand.cpp
29

Sure. It does not matter much in real world because ability is never calculated on SymbolFileOnDemand.

98

If the user sets a file and line breakpoint and a compile unit/supported file is matched against the breakpoint file its parent module's SymbolFileOnDemand will be hydrated before ParseLineTable() is called anyway so manual call through is unnecessary.
I prefer to keep things not pass through by default to prevent any accidental leaking.

515–517

Yes, we could. But I would like to keep all functions guarded by default unless explicitly required by a hydration scenario to prevent any accidental leakage.

lldb/source/Target/Statistics.cpp
66

Ah, thanks! Bad auto replace for variable name.

yinghuitan marked 4 inline comments as done.

Address review feedback.

I will remove ondemand test flavor/category and add more new testcases next.

Remove ondemand test category/flavor
Add new testcases:

  • source text regex breakpoint
  • regex breakpoint filter by language
  • global variable symbol table match hydration

Remove last ondemand test flavor testcase.

Add documentation for the feature.

Fix log channel typo

unify with target.preload-symbols feature

I do not have strong opinion on this. One concern is that target.preload-symbols provides full experience but symbols.load-on-demand provides trade-off experience (for example some global expression evaluation may fail because we won't automatically hydrate based on type query).

That is a fair point, but the having them separate also has downsides, as we have to figure out what is the meaning of on-demand=true&preload=true combo going to be. I think it would be strange to have on-demand take precedence in this setup, but then if it doesn't, then you need to change *two* settings to switch from "preload everything" (the current default) to the "on demand mode". I am not particularly invested in any solution, but I think we should make a conscious choice one way or the other.

These two settings can co-exist pretty easily. If preload is on, preloading can be done for anything that has debug info enabled, and would be done as soon as debug info was enabled for on demand. It would be like saying "please preload symbols as soon as debug info is enabled". Right now if we enable debug info due to a breakpoint, function, or global lookup that matches a symbol, we can immediately preload the symbols so they are ready regardless of wether a name lookup happens just like we do normally.

If preload is off, then we wait until someone makes a global name lookup call.

I understand your point, and I see how having a ternary setting where (exactly) one of the values creates a completely different debug experience could be confusing. When you describe it like this, I think the behavior makes sense. The part I am worried about is someone getting the relationship backwards (with preload-symbols being "on top"), and then wondering "why the %#@# is aren't my symbols loaded when I said you should preload them". And I do think one could think this is the expected behavior, since the (whether you like it or not) idea behind the preloading was to load everything up-front instead of at a random point in the debug session -- something which is no longer true when the on-demand feature is enabled.

Overall, I am still fairly fond of the ternary idea, but I am not going to push for it if there's interest in it. Having two settings makes sense too, if we can clearly communicate their relationship. One thing that might help is renaming (one or both) something so that both settings aren't using the word "load" (I'll note that you used the phrase "debug info is enabled" in your exposition), though I am not sure what would those names be.

Remove ondemand test category/flavor
Add new testcases:

  • source text regex breakpoint
  • regex breakpoint filter by language
  • global variable symbol table match hydration

Awesome. Thanks.

lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

Is this here because there is no portable way to create shared libraries in shell tests? Would it be possible to run this test on other systems as well if this were written as an API test (where we have a shared lib portability layer)?

yinghuitan marked an inline comment as done.Mar 29 2022, 9:21 AM
yinghuitan added inline comments.
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

Yeah, that's one reason. Another reason is the name of the shared library (used for checking global variables/debug info) varies across platform. For example, it can be libfoo.dylib (Mac) vs libfoo.so (linux) vs *foo*.dll (windows). I find it cumbersome to support multiple platforms.

labath added inline comments.Mar 29 2022, 11:21 AM
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

I'm not saying you have to go out of your way to support other platforms and their idiosyncrasies, but the differences in library names hardly seem like an insurmountable obstacle. We already have abstractions in the API tests (self.platformContext.{shlib_prefix,shlib_extension}) which exist precisely for this reason. There's no function to automatically construct the library name (you have to do the concatenation yourself), but I certainly wouldn't be opposed to adding one.

If it helps, think of it as courtesy towards other developers who may need to debug this feature when they break these tests -- that's much easier to achieve if you can reproduce the failure locally vs. just getting a failure email from a random bot (after you've already submitted your patch).

yinghuitan added inline comments.Mar 29 2022, 12:11 PM
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

self.platformContext.{shlib_prefix,shlib_extension}

Ah, did not know their existence, thanks! Will add API tests for them. Do you prefer to keep these linux-only shell tests or completely deprecate them?

labath added inline comments.Mar 29 2022, 12:57 PM
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

If they're going to be testing the exact same thing, then I don't see a reason for keeping these around.

Support debug map dwarf mode.
Use zero symbol ability to skip SymbolFileOnDemand wrapping.
Add new API tests for symbol on-demand which support multi-platform.
Remove linux-only shell tests.

So we added documentation on this feature explaining who will want to enable this feature and also describe exactly when and how debug info will get enabled. Does anyone else have objections or comments to make? If we need to setup a meeting to discuss, please chime in, or leave some review comments.

I'm looking at the SymbolFileOnDemand friendship and the forwarding of protected methods (mostly dealing with compile unit lists). Does this by any chance have something to do with the fact that there are now two compile unit lists (one in the actual symbol file, and one in the wrapping ondemand class? Would it be possible to avoid that by making SymbolFile a stateless interface (just methods, no member variables), and having a putting the member variables and other utility functions into a separate class that the real symbol files would inherit from? Then only SymbolFileOnDemand would inherit from SymbolFile directly, and it would only add the state that it needs (the bool flag, and the actual symbol file pointer).

Does this by any chance have something to do with the fact that there are now two compile unit lists (one in the actual symbol file, and one in the wrapping ondemand class?

Yes, that's the major reason. We also need make SymbolFileOnDemand friend in SymbolFile so that SymbolFileOnDemand can call/forward protected virtual methods of SymbolFile during overriding.

Would it be possible to avoid that by making SymbolFile a stateless interface?

What part do you want to avoid? We could do that by creating a new SymbolFileReal class, but we still have to make these compile unit lists methods virtual in SymbolFile class so that, like, calling SymbolFile::GetCompileUnitAtIndex() can be overridden by SymbolFileReal(touching data fields) and SymbolFileOnDemand (forwarding to real impl), right?

yinghuitan edited the summary of this revision. (Show Details)Apr 11 2022, 4:46 PM
yinghuitan edited the summary of this revision. (Show Details)Apr 11 2022, 7:42 PM
yinghuitan added a reviewer: labath.

I am trying to get rid of the friendship declaration and the forwarding of protected methods. I have no issue with making GetCompileUnitAtIndex virtual -- that is pretty much a necessity for this feature (*). However, SetCompileUnitAtIndex seems like an internal implementation detail of a SymbolFile instance (that's why it's protected), and it would be better if the new class didn't have to touch it. I wouldn't call the new class the "real" symbol file. I don't see it as any more real as any other subclass. The way I'd look at it is that the new class just contains some helpful implementation that can be useful to a specific symbol file class. An individual symbol file class may choose to use it, but it could also implement the pure SymbolFile interface is a completely different way, if it knows how.

(*) Well, I can also imagine an implementation where the on-demand logic lives inside the SymbolFile class (in public non-virtual methods), and the individual subclasses are implementing just the protected virtual (and maybe abstract) interface that is called from the public functions. That would be in line with the non-virtual interface design pattern, and would avoid having the GetBackingFile method, but I'm not sure if it would be cleaner overall.

Thanks for clarifying. The friendship declaration in SymbolFile is required not only for SetCompileUnitAtIndex but also for other protected virtual methods in SymbolFile like CalculateNumCompileUnits and ParseCompileUnitAtIndex which are pure virtual and require overriding and forwarding by SymbolFileOnDemand.
One solution I can think of is creating a new SymbolFileActual class as child class of SymbolFile and move all the protected virtual member functions from SymbolFile into SymbolFileActual, then modify the callsites of these protected virtual member functions into SymbolFileActual as well. This will get rid of friend declaration and SymbolFile won't/can't have protected virtual member functions. Something like below:

// Only has public virtual functions
class SymbolFile {
};

// Override and forward the public virtual functions so no friend is needed
class SymbolFileOnDemand: public SymbolFile {}

class SymbolFileActual: public SymbolFile {
public:
  // Override any public virtual functions of SymbolFile 
  // that are callsites of protected virtual functions
protected:
  // All protected virtual function of SymbolFile are moved here
  // All member variables of SymbolFile are moved here
};

class SymbolFileDWARF: public SymbolFileActual {}
class SymbolFilePDB: public SymbolFileActual {}
class SymbolFileBreakPad: public SymbolFileActual {}
...

Is this what you are suggesting? This seems to be a big change which might introduce new issues so I took a more practical approach with minimum change instead here. Let me know if this SymbolFileActual design aligns with what you have in mind, and if so whether you want it to happen in this diff or a follow-up patch is fine. Thanks.

Yes, I think that's exactly what I had in mind. Basically, the idea is to structure things such that the ondemand class can sit between the actual symbol file class and the outside world, but that it does not (and cannot) interfere with any of the interactions that happen inside a symbol file class. If it works, I think that kind of a setup would be much cleaner/understandable. As it stands now, one has to consider the possibility that any action inside the real symbol file can reenter the ondemand instance, which makes it harder to reason about.

I definitely wouldn't want to do it as a part of this patch, but it would be better to do it as a preparatory patch rather than a followup.

Rebase on top of https://reviews.llvm.org/D124110
Rename test file name to avoid conflict
Added a new totalModuleCountHasDebugInfo statistics

Thanks, for your patience. Overall, I'd say this looks pretty clean now -- much cleaner than I originally thought it could be. I don't have any further comments here.

Does anyone else have any thoughts?

lldb/source/Core/Module.cpp
474

Can this be moved into the if block below?

Bunch of random nits.

Looks like a great addition. I know debug load time is a common annoyance.

lldb/docs/use/ondemand.rst
2

If you can find a logical place for it, maybe define what "hydration" means in this context.

Developers can find the meaning from the source code but if "hydration" appears in logs it would be good for users to have an idea what that means.

Something like "the process of <whatever hydration means> (referred to internally as "hydration")".

93

"previously"

lldb/include/lldb/Symbol/SymbolFileOnDemand.h
32

Where is this call defined, and by API do you mean API like https://lldb.llvm.org/design/sbapi.html or API as in API of the SymbolFile class?

lldb/source/Symbol/SymbolFileOnDemand.cpp
48

Is there a specific reason to use "this" explicitly instead of just !m_debug_info_enabled?

Something to do with the wrapping of the underlying SymbolFile perhaps.

55

This should be langType shouldn't it?

111

What's the meaning of this comment?

134

Would return <some value> if hydrated?

lldb/test/Shell/SymbolFile/OnDemand/symbolic-breakpoint.test
2

Stray newline

Thanks Pavel for taking the time to review this thoroughly. I am fine with this as I have been working with Jeffrey on this, so anyone else please chime in if you have any comments!

A few follow up patches I would love to see after this goes in:

  • add a command that allows users to load debug info manually using a new or existing command when this feature is enabled
  • add settings for "never load any of these executables in Symbols On Demand" that takes a list of globs
  • add settings for "always load the the debug info for executables in Symbols On Demand" that takes a list of globs
  • add a new column in "image list" that shows up by default when Symbols On Demand is enable to show the status for each shlib like "not enabled for this", "debug info off" and "debug info on" (with a single character to short string, not the ones I just typed)
yinghuitan marked 8 inline comments as done.Apr 24 2022, 7:19 PM

Thanks for all the feedback. I will address the comments and add @clayborg's follow-up suggestions into summary.

lldb/source/Symbol/SymbolFileOnDemand.cpp
48

I will remove from this diff. I guess I just got this habit from working in other languages which may explicitly require this otherwise can fail.

111

Return false from ForEachExternalModule tells the caller to not exit early. This is the default value used by SymbolFile::ForEachExternalModule(). Let me make it more obvious.

yinghuitan marked 3 inline comments as done.

Incorporate feedback and update summary with follow-up todos.

LGTM. Anyone else have any issues?

yinghuitan edited the summary of this revision. (Show Details)Apr 25 2022, 6:37 PM
labath accepted this revision.Apr 26 2022, 6:06 AM
This revision is now accepted and ready to land.Apr 26 2022, 6:06 AM
This revision was landed with ongoing or failed builds.Apr 26 2022, 10:42 AM
This revision was automatically updated to reflect the committed changes.

@stella.stamenova, thanks for the heads-up. I am fixing this in https://reviews.llvm.org/D124471.

Another patch to fix the missing import: https://reviews.llvm.org/D124479

Hi, I noticed that we have two tests with the same name,

lldb/test/API/lang/c/shared_lib/TestSharedLib.py
lldb/test/API/symbol_ondemand/shared_library/TestSharedLib.py

Can you rename your newly added test?

yinghuitan added a comment.EditedApr 26 2022, 6:08 PM

@jasonmolenda, thanks, I discovered a bit late that testcases can't have duplicate names; I fixed one case in the patch but missed these two. Will add a follow-up patch. It would be great if we have a linter/script to detect/warn duplications, not sure how hard to add though.

Thanks! Yeah, I only noticed because lldb-dotest threw an error when I was trying to run a test case earlier.

yinghuitan added a comment.EditedApr 26 2022, 8:21 PM

Thanks for the context. Renames conflict testcase in https://reviews.llvm.org/D124499.