This is an archive of the discontinued LLVM Phabricator instance.

Introduce new symbol on-demand for debug info
ClosedPublic

Authored by yinghuitan on Mar 14 2022, 11:41 AM.

Details

Summary

This diff introduces a new symbol on-demand which skips
loading a module's debug info unless explicitly asked on
demand. This provides significant performance improvement
for application with dynamic linking mode which has large
number of modules.
The feature can be turned on with:
"settings set symbols.load-on-demand true"

The feature works by creating a new SymbolFileOnDemand class for
each module which wraps the actual SymbolFIle subclass as member
variable. By default, most virtual methods on SymbolFileOnDemand are
skipped so that it looks like there is no debug info for that module.
But once the module's debug info is explicitly requested to
be enabled (in the conditions mentioned below) SymbolFileOnDemand
will allow all methods to pass through and forward to the actual SymbolFile
which would hydrate module's debug info on-demand.

In an internal benchmark, we are seeing more than 95% improvement
for a 3000 modules application.

Currently we are providing several ways to on demand hydrate
a module's debug info:

  • Source line breakpoint: matching in supported files
  • Stack trace: resolving symbol context for an address
  • Symbolic breakpoint: symbol table match guided promotion
  • Global variable: symbol table match guided promotion

In all above situations the module's debug info will be on-demand
parsed and indexed.

Some follow-ups for this feature:

  • Add a command that allows users to load debug info explicitly while using a new or existing command when this feature is enabled
  • Add settings for "never load any of these executables in Symbols On Demand" that takes a list of globs
  • Add settings for "always load the the debug info for executables in Symbols On Demand" that takes a list of globs
  • Add a new column in "image list" that shows up by default when Symbols On Demand is enable to show the status for each shlib like "not enabled for this", "debug info off" and "debug info on" (with a single character to short string, not the ones I just typed)

Diff Detail

Event Timeline

yinghuitan created this revision.Mar 14 2022, 11:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 11:41 AM
yinghuitan requested review of this revision.Mar 14 2022, 11:41 AM
Herald added a project: Restricted Project. · View Herald Transcript

Fix tests line number change after formatting.

kusmour added a subscriber: kusmour.

I have been working with Jeffrey on this feature and wanted to explain our thoughts a bit.

We wanted to solve the issue of having too much debug information in your project and having debugging being slowed down by all of this information. Much of this information is not required at all times and a few global debug info queries can end up pulling in all of the debug information and causing performance issues.

Our solution was to create a SymbolFileOnDemand class that will own the actual SymbolFile subclass as a member variable. This SymbolFileOnDemand class can then pick and choose when the "turn on" the debug information for each module depending on what kind of queries are sent to these objects.

As Jeffrey said the main triggers for enabling debug information are:

  • set a file and line breakpoint in the module. We always allow these queries to access the line tables in the debug information, and if a source file and line breakpoint are found, then debug info is turned on and all queries to SymbolFileOnDemand will be passed directly to the subclass (SymbolFileDWARF for example).
  • if a stack trace has stack frames with a PC value that is in a library, then we turn on debug info. We do this by seeing when we get a "resolve this lldb_private::Address into a symbol context with debug info". We assume anyone that has resolved a section offset address (lldb_private::Address) and then asks for a symbol context to be resolved is knowing they want debug info from this module
  • a call to FindFunctions will enable debug info if we find a match in the symbol table, so this on demand symbol feature will work better if users don't strip their binaries
  • a call to find global variables, which will enable debug info if we find a symbol that matches. Again, this demand symbol feature will work better if users don't strip their binaries

Our goal here is to allow users to have all the debug information they want, but try to only use the debug information that we need and stem the costs of global lookups in debug information when we can. We want this feature to not require any additional settings to work well other than enabling it, but we also realize we can add settings in the future to say "always load debug info for these modules" or "never load debug info for these modules.

Since most users in IDE settings will set file and line breakpoints during debugging, we are hoping this feature "just works" for most users. Since any stack frames in any threads that are being displayed in the IDE will cause the debug info to be loaded for those modules, the most relevant code to the user will always have debug info enabled.

include/lldb/Symbol/SymbolFileOnDemand.h
2–3

Make this on the same line and avoid the wrapping

This sort of change needs to do a lot more work to help the user understand what they can and can't see if they accept the speed for visibility tradeoff.

They (and we) also need to know what operations might trigger a full "hydration" of a symbol table. It isn't useful to have all this infrastructure in place if fairly common operations trigger all the work anyway.

It also seems to me that whenever the user tells about a shared library, we should "hydrate" it. So for instance, if I do:

(lldb) break set -s someLib.dylib -f foo.c -l 12

they you should first hydrate someLib.dylib and then set the breakpoint.

In general, there should be a strategy discussion up-front before just making this change, with use cases of what won't work (not just some disabled tests) and how ordinary users might figure out something didn't work because of the on-demand loading. There needs to also be a way for lldb to inform users what to expect or the experience will just end up being frustrating.

It also seems like there should be a way to say "These libraries I'm unlikely to care about, but these I always want to have fully expanded".

Normally, it would be fine to say "we can add those details as we go along" but you're designing a new mode of interaction with the debugger, and I think we need to sketch out what we think are acceptable tradeoffs and where we need to give people the ability to manually intervene, etc. So it seems like we should first be having a strategy discussion, before getting down to the details of this patch.

include/lldb/Symbol/SymbolFileOnDemand.h
2

This file seems to be out of place. I don't think it's needed.

Remove duplicate file and fix wrapping.

labath added a subscriber: labath.Mar 15 2022, 5:02 AM

I agree with everything Jim said.

One of the interesting 'strategy' questions would be the interaction between this feature and the target.preload-symbols feature. I don't think the interaction between the two is obvious or intuitive, and I'm even sure that supporting all possible combinations is useful. We could e.g. consider replacing the two settings with one ternary setting with values like "load everything upfront", "load everything the first time it's needed" and "load everything when the module is accessed in a specific way".

Also, I don't think that "run everything with the setting turned on" is a good testing strategy. Nearly all of our tests consist of a single binary and the very first action they do is set a file/line or a function breakpoint. That will immediately trigger a 'hydration' and the rest of the test will just check that the forwarders really forward all calls. Rerunning all tests to catch the few testcases that do something interesting is not a good tradeoff.

We should identify the interesting/important scenarios and write targeted tests for that. The tests in this patch are a pretty good start. My only complaint is that the shared library tests are linux-only. We currently don't have infrastructure in place to write shell tests with shared libraries portably. It may be worth rewriting those into api tests to make use of the existing shared library facilities.

yinghuitan marked 2 inline comments as done.Mar 15 2022, 3:45 PM

Jim, Pavel, thanks for the comment!

I am happy to start a strategy discussion for this feature. @clayborg will chrime in later to suggest the approach how we can discuss.

I would like to answer some of the questions here to share some context.

speed for visibility tradeoff

Right, that is the basic principle of this feature. We are designing this feature primarily targeting IDE use cases which, we found, majority of the users are focusing on several modules authored/depended by their own code.
Also they primarily use basic subset features of lldb.

They (and we) also need to know what operations might trigger a full "hydration" of a symbol table

We would like to make the feature work *by default* as much as possible so we decided to use breakpoint match, callstack and symbol table match to guide a module's debug info hydration. To direct answer this question: I haven't find a common case that can hydrate all process modules debug info yet unless there is callstack covering all modules or a function name matching in all module's symbol table while setting function breakpoint.

whenever the user tells about a shared library, we should "hydrate" it

I agree. Since this feature is designed for IDE scenario so we haven't focused on covering all lldb commands when a module is specified. In "break set -s someLib.dylib -f foo.c -l 12" example, if there is a matching foo.c in someLib.dylib, it will be hydrated; otherwise not.

It also seems like there should be a way to say "These libraries I'm unlikely to care about, but these I always want to have fully expanded".

I agree. It is in my plan to add a "symbols.on-demand-modules" setting which users can explicitly white list some modules for hydration.

unify with target.preload-symbols feature

I do not have strong opinion on this. One concern is that target.preload-symbols provides full experience but symbols.load-on-demand provides trade-off experience (for example some global expression evaluation may fail because we won't automatically hydrate based on type query).

I don't think that "run everything with the setting turned on" is a good testing strategy

I do not have strong opinion either. I understand the concern is extra test time. I do want to share that there are many tests that do not set breakpoints which exercised symbol ondemand code path and helped to catch real bugs in the implementation.

This sort of change needs to do a lot more work to help the user understand what they can and can't see if they accept the speed for visibility tradeoff.

They (and we) also need to know what operations might trigger a full "hydration" of a symbol table. It isn't useful to have all this infrastructure in place if fairly common operations trigger all the work anyway.

Symbol tables are always on and are used to help us find things that will possibly be in the debug info from a function name (look for Code symbols) and global variable (Data symbols) perspective. So a few things here:

  • symbol tables are always parsed for everything and help us enable debug info if we find a match for FindFunctions and FindGlobals calls on the symbol file even if these SymbolFile calls specify to not check the symbol table. If we find a symbol that matches, we turn on debug info and repeat the call by calling into the symbol file if debug info wasn't previously enabled.
  • any Address -> symbol context where the user asks for debug info enables debug info: which usually means any PC from any stack frame, but can also mean if a user queries an address
  • line tables are always left enabled and if any match occurs we enable debug info

It also seems to me that whenever the user tells about a shared library, we should "hydrate" it. So for instance, if I do:

(lldb) break set -s someLib.dylib -f foo.c -l 12

they you should first hydrate someLib.dylib and then set the breakpoint.

that already works. Line tables are relatively cheap and they are always enabled through SymbolFileOnDemand and if. "foo.c:12" kicks up any matches, regardless of wether the shared library was specified, will enable debug info for any libraries that match.

In general, there should be a strategy discussion up-front before just making this change, with use cases of what won't work (not just some disabled tests) and how ordinary users might figure out something didn't work because of the on-demand loading. There needs to also be a way for lldb to inform users what to expect or the experience will just end up being frustrating.

Jim: I met with you to discuss all of these strategies a month ago and I left with the impression the ideas I told you about could work pretty well and that is what Jeffrey implemented here.

It also seems like there should be a way to say "These libraries I'm unlikely to care about, but these I always want to have fully expanded".

We were hoping to see if we can automagically make this stuff work well enough that it requires no settings, but we are happy to add these settings in subsequent patches. The patch is already large as it is. Let us know if these should be added with this initial patch. We might also think about asking the Platform layer if they have any libraries that they know should be always on.

Normally, it would be fine to say "we can add those details as we go along" but you're designing a new mode of interaction with the debugger, and I think we need to sketch out what we think are acceptable tradeoffs and where we need to give people the ability to manually intervene, etc. So it seems like we should first be having a strategy discussion, before getting down to the details of this patch.

We would love to setup a Video Chat with anyone interested if that would work. Talking back and forth on email or lists takes a while and I think we can easily hash things out if any interested parties are willing to attend. We are on Pacific time zone, anyone else that wants to participate, please chime in and we will see if we can set something up?!

I am personally not a fan of the preloading of symbols settings being on by default as there are large delays in debug session startup on linux due to the lack of any kind of viable acceleration tables. I understand that delays can occur later in the debug session as a result, but those are usually tied to global lookups from expressions and many people using IDEs will set file and line breakpoints and just use the variable display which doesn't need everything preloaded many times. We are also looking into trying to track who is doing these global lookups and trying to ensure that the global lookup is needed. One call to fine all "iterator" types is an example of a really bad global lookup that can cause tons of types to be pulled in.

I do think that possibly enabling this feature by changing the preload seeting to allow "preload", "on-demand" or "lazily" could easily work here.

I also agree that we should provide documentation for this feature, or possibly print out a blurb of text to the console that explains the rules if this feature gets enabled.

unify with target.preload-symbols feature

I do not have strong opinion on this. One concern is that target.preload-symbols provides full experience but symbols.load-on-demand provides trade-off experience (for example some global expression evaluation may fail because we won't automatically hydrate based on type query).

That is a fair point, but the having them separate also has downsides, as we have to figure out what is the meaning of on-demand=true&preload=true combo going to be. I think it would be strange to have on-demand take precedence in this setup, but then if it doesn't, then you need to change *two* settings to switch from "preload everything" (the current default) to the "on demand mode". I am not particularly invested in any solution, but I think we should make a conscious choice one way or the other.

I don't think that "run everything with the setting turned on" is a good testing strategy

I do not have strong opinion either. I understand the concern is extra test time. I do want to share that there are many tests that do not set breakpoints which exercised symbol ondemand code path and helped to catch real bugs in the implementation.

How many tests/bugs are we talking about here? Were there more than say ten distinct bugs caught there?

I see a large difference between running tests to find /new/ bugs, and running tests to find /regressions/. Our (API) test suite allows you to run it in a lot of different ways, which can be used to find new bugs. I have used that myself on several occasions. Even without making any changes to the test suite, you could run it with your feature enabled through the --setting argument to dotest. But simply because you manage to find a bug using some combination of dotest arguments it does not mean that everyone should be running the test suite with those arguments to ensure it doesn't regress. It just doesn't scale. When you find (and fix) a bug, you can make a dedicated regression test for that bug. Maybe even more than one -- to test similar edge cases that happened to work already, but could be broken in the future.

Please don't take this personally. This isn't about you or your feature -- that's my standard response to anyone tempted (in a way, it was a mistake to make it too easy to add new flavours) to add new test flavours. All of what I said applies (maybe even more so than here) to some of the existing test categories (gmodules for one), but since they're already here it becomes much harder to remove them -- which is why I'm trying to avoid adding any new ones.

unify with target.preload-symbols feature

I do not have strong opinion on this. One concern is that target.preload-symbols provides full experience but symbols.load-on-demand provides trade-off experience (for example some global expression evaluation may fail because we won't automatically hydrate based on type query).

That is a fair point, but the having them separate also has downsides, as we have to figure out what is the meaning of on-demand=true&preload=true combo going to be. I think it would be strange to have on-demand take precedence in this setup, but then if it doesn't, then you need to change *two* settings to switch from "preload everything" (the current default) to the "on demand mode". I am not particularly invested in any solution, but I think we should make a conscious choice one way or the other.

I don't think that "run everything with the setting turned on" is a good testing strategy

I do not have strong opinion either. I understand the concern is extra test time. I do want to share that there are many tests that do not set breakpoints which exercised symbol ondemand code path and helped to catch real bugs in the implementation.

How many tests/bugs are we talking about here? Were there more than say ten distinct bugs caught there?

I see a large difference between running tests to find /new/ bugs, and running tests to find /regressions/. Our (API) test suite allows you to run it in a lot of different ways, which can be used to find new bugs. I have used that myself on several occasions. Even without making any changes to the test suite, you could run it with your feature enabled through the --setting argument to dotest. But simply because you manage to find a bug using some combination of dotest arguments it does not mean that everyone should be running the test suite with those arguments to ensure it doesn't regress. It just doesn't scale. When you find (and fix) a bug, you can make a dedicated regression test for that bug. Maybe even more than one -- to test similar edge cases that happened to work already, but could be broken in the future.

Please don't take this personally. This isn't about you or your feature -- that's my standard response to anyone tempted (in a way, it was a mistake to make it too easy to add new flavours) to add new test flavours. All of what I said applies (maybe even more so than here) to some of the existing test categories (gmodules for one), but since they're already here it becomes much harder to remove them -- which is why I'm trying to avoid adding any new ones.

We have some similar wording to that extent on the "testing" page: https://lldb.llvm.org/resources/test.html#id7. I like the distinction you make between using the ability go use the flavors for new features vs using it to catch regressions. Would you consider adding that there?

unify with target.preload-symbols feature

I do not have strong opinion on this. One concern is that target.preload-symbols provides full experience but symbols.load-on-demand provides trade-off experience (for example some global expression evaluation may fail because we won't automatically hydrate based on type query).

That is a fair point, but the having them separate also has downsides, as we have to figure out what is the meaning of on-demand=true&preload=true combo going to be. I think it would be strange to have on-demand take precedence in this setup, but then if it doesn't, then you need to change *two* settings to switch from "preload everything" (the current default) to the "on demand mode". I am not particularly invested in any solution, but I think we should make a conscious choice one way or the other.

These two settings can co-exist pretty easily. If preload is on, preloading can be done for anything that has debug info enabled, and would be done as soon as debug info was enabled for on demand. It would be like saying "please preload symbols as soon as debug info is enabled". Right now if we enable debug info due to a breakpoint, function, or global lookup that matches a symbol, we can immediately preload the symbols so they are ready regardless of wether a name lookup happens just like we do normally.

If preload is off, then we wait until someone makes a global name lookup call.

I don't think that "run everything with the setting turned on" is a good testing strategy

I do not have strong opinion either. I understand the concern is extra test time. I do want to share that there are many tests that do not set breakpoints which exercised symbol ondemand code path and helped to catch real bugs in the implementation.

How many tests/bugs are we talking about here? Were there more than say ten distinct bugs caught there?

I see a large difference between running tests to find /new/ bugs, and running tests to find /regressions/. Our (API) test suite allows you to run it in a lot of different ways, which can be used to find new bugs. I have used that myself on several occasions. Even without making any changes to the test suite, you could run it with your feature enabled through the --setting argument to dotest. But simply because you manage to find a bug using some combination of dotest arguments it does not mean that everyone should be running the test suite with those arguments to ensure it doesn't regress. It just doesn't scale. When you find (and fix) a bug, you can make a dedicated regression test for that bug. Maybe even more than one -- to test similar edge cases that happened to work already, but could be broken in the future.

Please don't take this personally. This isn't about you or your feature -- that's my standard response to anyone tempted (in a way, it was a mistake to make it too easy to add new flavours) to add new test flavours. All of what I said applies (maybe even more so than here) to some of the existing test categories (gmodules for one), but since they're already here it becomes much harder to remove them -- which is why I'm trying to avoid adding any new ones.

We knew adding a new flavor would be a contentious part of the patch and we are fine testing this however the community feels it is best tested. That being said, it would be good to catch regressions for this feature on at least one Buildbot so we don't regress the feature, I am not particular on how it happens.

Is there any interest in a VC call that anyone can choose to attend about this, or do we prefer an RFC on the website that hosts what the mailing list used to be and I can't remember right now?

clayborg added inline comments.Mar 16 2022, 1:36 PM
lldb/include/lldb/Symbol/SymbolFileOnDemand.h
203–204
lldb/source/Core/CoreProperties.td
39

We might want to put a URL to something on our website that describes this feature in detail. It isn't just breakpoints that enable debug info, and we will need to document this feature completely so users know what to expect if they enable this.

lldb/source/Core/Module.cpp
472

This should only be done if someone is asking for debug info in the "resolve_scope".

lldb/source/Symbol/SymbolFile.cpp
83

Add a comment here explaining what is happening

lldb/source/Symbol/SymbolFileOnDemand.cpp
29

We might consider letting this call always go through to the underlying symbol file and report the abilities from it.

35–37

remove braces for a single statement "if" per llvm coding guidelines

63–66

remove braces for a single statement "if" per llvm coding guidelines

80–82

remove braces for a single statement "if" per llvm coding guidelines

98

We might be able to always let this call through since the line tables will be parsed if the user sets a file and line breakpoint. We are mainly trying to stop the global name lookups in SymbolFileOnDemand

173–176

remove braces for a single statement "if" per llvm coding guidelines

207–209

remove braces for a single statement "if" per llvm coding guidelines

515–517

We might consider letting this always go through as unwind info doesn't involve any indexing of debug info.

603–605

remove braces for a single statement "if" per llvm coding guidelines

609–611

remove braces for a single statement "if" per llvm coding guidelines

lldb/source/Target/Statistics.cpp
66

Other key/value pairs for symbol table stuff start with "symbolTable",

235–237

remove braces for a single statement "if" per llvm coding guidelines

240–242

remove braces for a single statement "if" per llvm coding guidelines

clayborg added inline comments.Mar 16 2022, 1:36 PM
lldb/source/Target/Statistics.cpp
271

Other "total" key value pairs use the same string as the ones from the module level to help match them up.

272

Other "total" key value pairs use the same string as the ones from the module level to help match them up.

Re @labath

How many tests/bugs are we talking about here? Were there more than say ten distinct bugs caught there?

It found 2~3 bugs, like source regex breakpoint (breakpoint set -p "source"), regex breakpoint filter by language (breakpoint set -r <regex> -L c++) etc.

running tests to new features vs find /regressions

Running all tests offline once to discover bugs is a good exericse. Since failing tests are understood we can put them into new tests with on-demand explicitly enabled then ondemand test category is unnecessary. This sounds good to me if Greg agrees.

Please don't take this personally

None taken :-)

yinghuitan marked 19 inline comments as done.Mar 17 2022, 2:50 PM
yinghuitan added inline comments.
lldb/source/Symbol/SymbolFileOnDemand.cpp
29

Sure. It does not matter much in real world because ability is never calculated on SymbolFileOnDemand.

98

If the user sets a file and line breakpoint and a compile unit/supported file is matched against the breakpoint file its parent module's SymbolFileOnDemand will be hydrated before ParseLineTable() is called anyway so manual call through is unnecessary.
I prefer to keep things not pass through by default to prevent any accidental leaking.

515–517

Yes, we could. But I would like to keep all functions guarded by default unless explicitly required by a hydration scenario to prevent any accidental leakage.

lldb/source/Target/Statistics.cpp
66

Ah, thanks! Bad auto replace for variable name.

yinghuitan marked 4 inline comments as done.

Address review feedback.

I will remove ondemand test flavor/category and add more new testcases next.

Remove ondemand test category/flavor
Add new testcases:

  • source text regex breakpoint
  • regex breakpoint filter by language
  • global variable symbol table match hydration

Remove last ondemand test flavor testcase.

Add documentation for the feature.

Fix log channel typo

unify with target.preload-symbols feature

I do not have strong opinion on this. One concern is that target.preload-symbols provides full experience but symbols.load-on-demand provides trade-off experience (for example some global expression evaluation may fail because we won't automatically hydrate based on type query).

That is a fair point, but the having them separate also has downsides, as we have to figure out what is the meaning of on-demand=true&preload=true combo going to be. I think it would be strange to have on-demand take precedence in this setup, but then if it doesn't, then you need to change *two* settings to switch from "preload everything" (the current default) to the "on demand mode". I am not particularly invested in any solution, but I think we should make a conscious choice one way or the other.

These two settings can co-exist pretty easily. If preload is on, preloading can be done for anything that has debug info enabled, and would be done as soon as debug info was enabled for on demand. It would be like saying "please preload symbols as soon as debug info is enabled". Right now if we enable debug info due to a breakpoint, function, or global lookup that matches a symbol, we can immediately preload the symbols so they are ready regardless of wether a name lookup happens just like we do normally.

If preload is off, then we wait until someone makes a global name lookup call.

I understand your point, and I see how having a ternary setting where (exactly) one of the values creates a completely different debug experience could be confusing. When you describe it like this, I think the behavior makes sense. The part I am worried about is someone getting the relationship backwards (with preload-symbols being "on top"), and then wondering "why the %#@# is aren't my symbols loaded when I said you should preload them". And I do think one could think this is the expected behavior, since the (whether you like it or not) idea behind the preloading was to load everything up-front instead of at a random point in the debug session -- something which is no longer true when the on-demand feature is enabled.

Overall, I am still fairly fond of the ternary idea, but I am not going to push for it if there's interest in it. Having two settings makes sense too, if we can clearly communicate their relationship. One thing that might help is renaming (one or both) something so that both settings aren't using the word "load" (I'll note that you used the phrase "debug info is enabled" in your exposition), though I am not sure what would those names be.

Remove ondemand test category/flavor
Add new testcases:

  • source text regex breakpoint
  • regex breakpoint filter by language
  • global variable symbol table match hydration

Awesome. Thanks.

lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

Is this here because there is no portable way to create shared libraries in shell tests? Would it be possible to run this test on other systems as well if this were written as an API test (where we have a shared lib portability layer)?

yinghuitan marked an inline comment as done.Mar 29 2022, 9:21 AM
yinghuitan added inline comments.
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

Yeah, that's one reason. Another reason is the name of the shared library (used for checking global variables/debug info) varies across platform. For example, it can be libfoo.dylib (Mac) vs libfoo.so (linux) vs *foo*.dll (windows). I find it cumbersome to support multiple platforms.

labath added inline comments.Mar 29 2022, 11:21 AM
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

I'm not saying you have to go out of your way to support other platforms and their idiosyncrasies, but the differences in library names hardly seem like an insurmountable obstacle. We already have abstractions in the API tests (self.platformContext.{shlib_prefix,shlib_extension}) which exist precisely for this reason. There's no function to automatically construct the library name (you have to do the concatenation yourself), but I certainly wouldn't be opposed to adding one.

If it helps, think of it as courtesy towards other developers who may need to debug this feature when they break these tests -- that's much easier to achieve if you can reproduce the failure locally vs. just getting a failure email from a random bot (after you've already submitted your patch).

yinghuitan added inline comments.Mar 29 2022, 12:11 PM
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

self.platformContext.{shlib_prefix,shlib_extension}

Ah, did not know their existence, thanks! Will add API tests for them. Do you prefer to keep these linux-only shell tests or completely deprecate them?

labath added inline comments.Mar 29 2022, 12:57 PM
lldb/test/Shell/SymbolFile/OnDemand/shared-lib-globals-hydration.test
4 ↗(On Diff #418708)

If they're going to be testing the exact same thing, then I don't see a reason for keeping these around.

Support debug map dwarf mode.
Use zero symbol ability to skip SymbolFileOnDemand wrapping.
Add new API tests for symbol on-demand which support multi-platform.
Remove linux-only shell tests.

So we added documentation on this feature explaining who will want to enable this feature and also describe exactly when and how debug info will get enabled. Does anyone else have objections or comments to make? If we need to setup a meeting to discuss, please chime in, or leave some review comments.

I'm looking at the SymbolFileOnDemand friendship and the forwarding of protected methods (mostly dealing with compile unit lists). Does this by any chance have something to do with the fact that there are now two compile unit lists (one in the actual symbol file, and one in the wrapping ondemand class? Would it be possible to avoid that by making SymbolFile a stateless interface (just methods, no member variables), and having a putting the member variables and other utility functions into a separate class that the real symbol files would inherit from? Then only SymbolFileOnDemand would inherit from SymbolFile directly, and it would only add the state that it needs (the bool flag, and the actual symbol file pointer).

Does this by any chance have something to do with the fact that there are now two compile unit lists (one in the actual symbol file, and one in the wrapping ondemand class?

Yes, that's the major reason. We also need make SymbolFileOnDemand friend in SymbolFile so that SymbolFileOnDemand can call/forward protected virtual methods of SymbolFile during overriding.

Would it be possible to avoid that by making SymbolFile a stateless interface?

What part do you want to avoid? We could do that by creating a new SymbolFileReal class, but we still have to make these compile unit lists methods virtual in SymbolFile class so that, like, calling SymbolFile::GetCompileUnitAtIndex() can be overridden by SymbolFileReal(touching data fields) and SymbolFileOnDemand (forwarding to real impl), right?

yinghuitan edited the summary of this revision. (Show Details)Apr 11 2022, 4:46 PM
yinghuitan edited the summary of this revision. (Show Details)Apr 11 2022, 7:42 PM
yinghuitan added a reviewer: labath.

I am trying to get rid of the friendship declaration and the forwarding of protected methods. I have no issue with making GetCompileUnitAtIndex virtual -- that is pretty much a necessity for this feature (*). However, SetCompileUnitAtIndex seems like an internal implementation detail of a SymbolFile instance (that's why it's protected), and it would be better if the new class didn't have to touch it. I wouldn't call the new class the "real" symbol file. I don't see it as any more real as any other subclass. The way I'd look at it is that the new class just contains some helpful implementation that can be useful to a specific symbol file class. An individual symbol file class may choose to use it, but it could also implement the pure SymbolFile interface is a completely different way, if it knows how.

(*) Well, I can also imagine an implementation where the on-demand logic lives inside the SymbolFile class (in public non-virtual methods), and the individual subclasses are implementing just the protected virtual (and maybe abstract) interface that is called from the public functions. That would be in line with the non-virtual interface design pattern, and would avoid having the GetBackingFile method, but I'm not sure if it would be cleaner overall.

Thanks for clarifying. The friendship declaration in SymbolFile is required not only for SetCompileUnitAtIndex but also for other protected virtual methods in SymbolFile like CalculateNumCompileUnits and ParseCompileUnitAtIndex which are pure virtual and require overriding and forwarding by SymbolFileOnDemand.
One solution I can think of is creating a new SymbolFileActual class as child class of SymbolFile and move all the protected virtual member functions from SymbolFile into SymbolFileActual, then modify the callsites of these protected virtual member functions into SymbolFileActual as well. This will get rid of friend declaration and SymbolFile won't/can't have protected virtual member functions. Something like below:

// Only has public virtual functions
class SymbolFile {
};

// Override and forward the public virtual functions so no friend is needed
class SymbolFileOnDemand: public SymbolFile {}

class SymbolFileActual: public SymbolFile {
public:
  // Override any public virtual functions of SymbolFile 
  // that are callsites of protected virtual functions
protected:
  // All protected virtual function of SymbolFile are moved here
  // All member variables of SymbolFile are moved here
};

class SymbolFileDWARF: public SymbolFileActual {}
class SymbolFilePDB: public SymbolFileActual {}
class SymbolFileBreakPad: public SymbolFileActual {}
...

Is this what you are suggesting? This seems to be a big change which might introduce new issues so I took a more practical approach with minimum change instead here. Let me know if this SymbolFileActual design aligns with what you have in mind, and if so whether you want it to happen in this diff or a follow-up patch is fine. Thanks.

Yes, I think that's exactly what I had in mind. Basically, the idea is to structure things such that the ondemand class can sit between the actual symbol file class and the outside world, but that it does not (and cannot) interfere with any of the interactions that happen inside a symbol file class. If it works, I think that kind of a setup would be much cleaner/understandable. As it stands now, one has to consider the possibility that any action inside the real symbol file can reenter the ondemand instance, which makes it harder to reason about.

I definitely wouldn't want to do it as a part of this patch, but it would be better to do it as a preparatory patch rather than a followup.

Rebase on top of https://reviews.llvm.org/D124110
Rename test file name to avoid conflict
Added a new totalModuleCountHasDebugInfo statistics

Thanks, for your patience. Overall, I'd say this looks pretty clean now -- much cleaner than I originally thought it could be. I don't have any further comments here.

Does anyone else have any thoughts?

lldb/source/Core/Module.cpp
475

Can this be moved into the if block below?

Bunch of random nits.

Looks like a great addition. I know debug load time is a common annoyance.

lldb/docs/use/ondemand.rst
1 ↗(On Diff #424360)

If you can find a logical place for it, maybe define what "hydration" means in this context.

Developers can find the meaning from the source code but if "hydration" appears in logs it would be good for users to have an idea what that means.

Something like "the process of <whatever hydration means> (referred to internally as "hydration")".

92 ↗(On Diff #424360)

"previously"

lldb/include/lldb/Symbol/SymbolFileOnDemand.h
32

Where is this call defined, and by API do you mean API like https://lldb.llvm.org/design/sbapi.html or API as in API of the SymbolFile class?

lldb/source/Symbol/SymbolFileOnDemand.cpp
48

Is there a specific reason to use "this" explicitly instead of just !m_debug_info_enabled?

Something to do with the wrapping of the underlying SymbolFile perhaps.

55

This should be langType shouldn't it?

111

What's the meaning of this comment?

134

Would return <some value> if hydrated?

lldb/test/Shell/SymbolFile/OnDemand/symbolic-breakpoint.test
2

Stray newline

Thanks Pavel for taking the time to review this thoroughly. I am fine with this as I have been working with Jeffrey on this, so anyone else please chime in if you have any comments!

A few follow up patches I would love to see after this goes in:

  • add a command that allows users to load debug info manually using a new or existing command when this feature is enabled
  • add settings for "never load any of these executables in Symbols On Demand" that takes a list of globs
  • add settings for "always load the the debug info for executables in Symbols On Demand" that takes a list of globs
  • add a new column in "image list" that shows up by default when Symbols On Demand is enable to show the status for each shlib like "not enabled for this", "debug info off" and "debug info on" (with a single character to short string, not the ones I just typed)
yinghuitan marked 8 inline comments as done.Apr 24 2022, 7:19 PM

Thanks for all the feedback. I will address the comments and add @clayborg's follow-up suggestions into summary.

lldb/source/Symbol/SymbolFileOnDemand.cpp
48

I will remove from this diff. I guess I just got this habit from working in other languages which may explicitly require this otherwise can fail.

111

Return false from ForEachExternalModule tells the caller to not exit early. This is the default value used by SymbolFile::ForEachExternalModule(). Let me make it more obvious.

yinghuitan marked 3 inline comments as done.

Incorporate feedback and update summary with follow-up todos.

LGTM. Anyone else have any issues?

yinghuitan edited the summary of this revision. (Show Details)Apr 25 2022, 6:37 PM
labath accepted this revision.Apr 26 2022, 6:06 AM
This revision is now accepted and ready to land.Apr 26 2022, 6:06 AM
This revision was landed with ongoing or failed builds.Apr 26 2022, 10:42 AM
This revision was automatically updated to reflect the committed changes.

@stella.stamenova, thanks for the heads-up. I am fixing this in https://reviews.llvm.org/D124471.

Another patch to fix the missing import: https://reviews.llvm.org/D124479

Hi, I noticed that we have two tests with the same name,

lldb/test/API/lang/c/shared_lib/TestSharedLib.py
lldb/test/API/symbol_ondemand/shared_library/TestSharedLib.py

Can you rename your newly added test?

yinghuitan added a comment.EditedApr 26 2022, 6:08 PM

@jasonmolenda, thanks, I discovered a bit late that testcases can't have duplicate names; I fixed one case in the patch but missed these two. Will add a follow-up patch. It would be great if we have a linter/script to detect/warn duplications, not sure how hard to add though.

Thanks! Yeah, I only noticed because lldb-dotest threw an error when I was trying to run a test case earlier.

yinghuitan added a comment.EditedApr 26 2022, 8:21 PM

Thanks for the context. Renames conflict testcase in https://reviews.llvm.org/D124499.