Page MenuHomePhabricator

[lldb] integrate debuginfod
Changes PlannedPublic

Authored by kwk on Mar 6 2020, 8:00 AM.

Details

Summary

This first patch does the heavy lifting of bootstrapping debuginfod with
CMake and integrating it to find a source file using debuginfod when
using (lldb) source list and the file cannot be found locally.

Read more about debuginfod here:
https://sourceware.org/elfutils/Debuginfod.html

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
In D75750#1929124, @kwk wrote:

@labath I've updated my patch and would love to hear your opinion on it. So far I've only written the python ServeDirectoryWithHTTP() function with proper doctest and documentation but since you mentioned the 0 port thingy I've tried that on the command line when using python -m http.server 0 and it works smoothly. That's why I've included the llvm-lit test I was working on.

Being able to use 0 to auto-assign a port is definitely a big improvement, but there still the question of retrieving that port and sychronization that goes with it, which you've now done with a while loop + sniffing through the server log.

And I still haven't gotten used to how the comments in your lit tests are way longer than the test itself. If I think about that harder, I guess the thing that really bothers me about that (even though I normally like comments) is that there is no visual distinction between "comments" and "code" this way -- it all shows up as grey in the review tool. Can't say that's really your fault, but it does make it hard to see what that test is doing nonetheless. (Some areas of llvm have a convention to use ## for "real" comments, which I guess can make things slightly better, but I still haven't seen comments this big there...

So overall, I think this version is better than what you had before, but it still doesn't convince me that this is better than python.

lldb/include/lldb/Host/DebugInfoD.h
17–19

I guess this is not needed now.

lldb/packages/Python/lldbsuite/test/httpserver.py
75 ↗(On Diff #251297)

What exactly is this timeout for? It seems rather small...

lldb/packages/Python/lldbsuite/test/lldbtest.py
4 ↗(On Diff #251297)

just commit this separately. no review needed.

kwk updated this revision to Diff 251376.Thu, Mar 19, 7:21 AM
kwk marked 6 inline comments as done.
  • Removed not needed forward decl
  • Format comments for better readability in my test
kwk added a comment.Thu, Mar 19, 7:23 AM

@labath I've improved my test file for readability.

lldb/include/lldb/Host/DebugInfoD.h
17–19

Right.

27

Removed.

lldb/packages/Python/lldbsuite/test/httpserver.py
75 ↗(On Diff #251297)

uff, I guess I had an idea when I wrote it but its lost now.

lldb/packages/Python/lldbsuite/test/lldbtest.py
4 ↗(On Diff #251297)
kwk updated this revision to Diff 251383.Thu, Mar 19, 7:57 AM
  • Validate that the server received the request from debuginfod client
jankratochvil added inline comments.Thu, Mar 19, 8:07 AM
lldb/source/Core/SourceManager.cpp
421–422

This comment should not stay there during check-in.

436–437

This comment should not stay there during check-in.

jankratochvil added inline comments.Thu, Mar 19, 8:42 AM
lldb/source/Host/common/DebugInfoD.cpp
40

const UUID &buildID as it is even bigger (40 bytes) than std::string (32 bytes).

57

Excessive leftover comment.

60

Here it will contact the server even if the binary does not contain any build-id - LLDB then generates UUID as 4 bytes long one:

// Use 4 bytes of crc from the .gnu_debuglink section.
u32le data(gnu_debuglink_crc);
uuid = UUID::fromData(&data, sizeof(data));

That is a needless performance regression.
I sure do not like making such decision on the LLDB side. Maybe libdebuginfod could rather make such optimization - IMO as Frank Eigler.

fche2 added a subscriber: fche2.Thu, Mar 19, 1:55 PM
fche2 added inline comments.
lldb/source/Host/common/DebugInfoD.cpp
60

Could kkleine reject uuid of length 4 in the above test, i.e. something like:

if (!uuid.IsValid() || uuid.GetBytes().size() == sizeof(u32le)) // .gnu_debuglink crc32
   continue;
jdoerfert resigned from this revision.Thu, Mar 19, 5:40 PM
labath added inline comments.Fri, Mar 20, 3:17 AM
lldb/source/Host/common/DebugInfoD.cpp
60

Ideally, lldb would not use the debug link crc as a uuid (and instead store that elsewhere), but rejecting the short uuids here does not seem _that_ bad.

jankratochvil added inline comments.Fri, Mar 20, 5:14 AM
lldb/source/Host/common/DebugInfoD.cpp
60

We were discussing with @kwk that in fact sending anything stored in UUID as build-id may not be right. debuginfod wants specifically build-id, not any other identifier. Or @fche2 - does it? Would debuginfod for example accept some that Apple UUID for Apple dsym files? Maybe LLDB could store some identifier how was the UUID obtained.

fche2 added inline comments.Fri, Mar 20, 6:59 AM
lldb/source/Host/common/DebugInfoD.cpp
60

Would debuginfod for example accept some that Apple UUID for Apple dsym files?

The debuginfod webapi specifies that buildids simply need to be lower case hex strings. It will dutifully accept any such string, and correctly report 403's for unknown ones.

kwk updated this revision to Diff 252026.Mon, Mar 23, 7:15 AM
  • check for valid UUID
  • less verbose mkdir and rm output
  • More explicit test and documentation
  • fixup
kwk added a comment.Mon, Mar 23, 7:40 AM

@labath @fche2 @jankratochvil I've implemented the logic to ignore invalid UUIDs and the ones that are too short. Can you have another look please?

jankratochvil added inline comments.Mon, Mar 23, 8:16 AM
lldb/source/Host/common/DebugInfoD.cpp
44

If it is done this way (and not in libdebuginfod.so) I think there should be <=8 because LLDB contains:

if (gnu_debuglink_crc) {
  // Use 4 bytes of crc from the .gnu_debuglink section.
  u32le data(gnu_debuglink_crc);
  uuid = UUID::fromData(&data, sizeof(data));
} else if (core_notes_crc) {
  // Use 8 bytes - first 4 bytes for *magic* prefix, mainly to make
  // it look different form .gnu_debuglink crc followed by 4 bytes
  // of note segments crc.
  u32le data[] = {u32le(g_core_uuid_magic), u32le(core_notes_crc)};
  uuid = UUID::fromData(data, sizeof(data));
}
kwk updated this revision to Diff 252054.Mon, Mar 23, 8:17 AM
kwk marked 2 inline comments as done.
  • Remove commented out code
  • Remove lldb/packages/Python/lldbsuite/test/httpserver.py in favor of lit test
  • Removed commented out left-over code
kwk updated this revision to Diff 252059.Mon, Mar 23, 8:42 AM
  • Adjust buildID verification

The code mostly fine for me, but this should be reviewed by properly by more people, once you're ready to take down the WIP tag.

I am still not happy with the test case.

lldb/source/Host/common/DebugInfoD.cpp
44

4 would have probably been fine too, as I don't think a core file "uuid" can make its way into here. In either case, we should document what is this working around, as 4 or 8 byte uuids are technically valid.

kwk updated this revision to Diff 252353.Tue, Mar 24, 9:37 AM
  • Add documentation for workaround on rejecting special build UUIDs
kwk retitled this revision from WIP: [lldb] integrate debuginfod to [lldb] integrate debuginfod.Tue, Mar 24, 9:39 AM
kwk edited the summary of this revision. (Show Details)
kwk marked 10 inline comments as done.

@labath @jankratochvil @fche2 I've addressed all your comments and hope the patch is good to go now.

lldb/source/Host/common/DebugInfoD.cpp
44

@labath. I've added a documentation for the workaround.

jankratochvil requested changes to this revision.Tue, Mar 24, 2:22 PM
jankratochvil added inline comments.
lldb/cmake/modules/FindDebuginfod.cmake
58

"No newline at end of file", this is what saving this diff, git apply --index and git diff says to me.

lldb/include/lldb/Host/DebugInfoD.h
28

Describe what does mean a returned std::string("") - that no error happened but server does not know this UUID/path.

lldb/source/Core/SourceManager.cpp
408

I do not like this extra line as it changes behavior of LLDB unrelated to debuginfod. IIUC if the source file with fully specified directory+filename in DWARF does not exist but the same filename exists in a different directory of the sourcetree LLDB will now quietly use the different file. That's a bug.
I think it is there as you needed to initialize sc.module_sp.

460

Make the debuginfod::isAvailable() check first as it is zero-cost, FileSystem::Instance().Exists is expensive filesystem operation.
The problem with that sc.module_sp is it is initialized above with some side effects. I think you should be fine without needing any sc. The following code does not pass the testcase for me but I guess you may fix it better:

// Try finding the file using elfutils' debuginfod
if (!FileSystem::Instance().Exists(m_file_spec) &&
    debuginfod::isAvailable())
  target->GetImages().ForEach(
      [&](const ModuleSP &module_sp) -> bool {
    llvm::Expected<std::string> cache_path = debuginfod::findSource(
        module_sp->GetUUID(), file_spec.GetCString());
    if (!cache_path) {
      module_sp->ReportWarning(
          "An error occurred while finding the "
          "source file %s using debuginfod for build ID %s: %s",
          file_spec.GetCString(),
          sc.module_sp->GetUUID().GetAsString("").c_str(),
          llvm::toString(cache_path.takeError()).c_str());
    } else if (!cache_path->empty()) {
      m_file_spec = FileSpec(*cache_path);
      m_mod_time = FileSystem::Instance().GetModificationTime(*cache_path);
      return false;
    }
    return true;
  });
lldb/source/Host/common/DebugInfoD.cpp
51

It should not be an error:

echo 'int main(void) { return 0; }' >/tmp/main2.c;gcc -o /tmp/main2 /tmp/main2.c -Wall -g -Wl,--build-id=none;rm /tmp/main2.c;DEBUGINFOD_URLS=http://localhost:8002/ ./bin/lldb /tmp/main2 -o 'l main' -o q
(lldb) target create "/tmp/main2"
Current executable set to '/tmp/main2' (x86_64).
(lldb) l main
warning: (x86_64) /tmp/main2 An error occurred while finding the source file /tmp/main2.c using debuginfod for build ID A9C3D738: invalid build ID: A9C3D738
File: /tmp/main2.c
(lldb) q
lldb/test/Shell/SymbolFile/DWARF/source-list.cpp
104

s/123/{{[0-9]+}}/?

115

"No newline at end of file", this is what saving this diff, git apply --index and git diff says to me.

This revision now requires changes to proceed.Tue, Mar 24, 2:22 PM
jankratochvil added inline comments.Tue, Mar 24, 3:07 PM
lldb/source/Core/SourceManager.cpp
415

This code could be more efficient than my previously proposed GetImages.ForEach() as it should be able to find the only one Module having that source file.
But there should be passed the full pathname incl. directories to prevent wrongly chosen accidentally filename-matching source files:

FileSystem::Instance().Exists(m_file_spec) ? file_spec.GetFilename().AsCString() : file_spec.GetCString(false/*denormalize*/)

And the Exists() check should be cached in this whole function as it is expensive.

460

Please ignore this comment + code fragment, I think it should not be needed.
(Just the isAvailable() check should be moved.)

kwk updated this revision to Diff 252510.Wed, Mar 25, 12:17 AM
kwk marked 10 inline comments as done.
  • Add newline to end of FindDebuginfod.cmake
  • Describe empty string returned from debuginfod::findSource()
  • Don't treat build IDs of len <= 8 as an error but simply as not found
  • move inexpensive debuginfod::isAvailable() check to beginning of if-stmt
  • Simplify line number check in test file to avoid adjusting the line number every time the test changes
  • Add newline to source-list.cpp test file
kwk added a comment.Wed, Mar 25, 12:17 AM

@jankratochvil thanks for this thorough review. I have to think about one comment more precisely but the rest was fixed.

lldb/source/Host/common/DebugInfoD.cpp
51

Okay, I'll have it return just an empty string. And adjust the comment on the empty string in findSource documentation. I fully understand that an error is undesirable in your test case. My question is if the caller should sanitize it's parameters passed to findSource of if the latter should silently ignore those wrong UUIDs. For now I silently ignore them and treat a wrong build ID like a not found (e.g. empty string is returned).

lldb/test/Shell/SymbolFile/DWARF/source-list.cpp
104

Both are fine, but I'll go with your's if that helps. If you can tell me how to get a lit CHECK statement that checks for incremental numbers, that'll be awesome ;)

labath requested changes to this revision.Wed, Mar 25, 1:45 AM
labath added a reviewer: jingham.
labath added a subscriber: jingham.

Adding @jingham. Jim, what do you make of this patch and the feature overall?

I know I said this looks "mostly good", but thinking about this further (and reading Jan's comments), I do find that there are still couple of things that trouble me here. :/

The first is the module_sp searching logic. I think that was previously here mainly to support the case when one enters source list I_am_too_lazy_to_enter_the_full_path.cc, and it would not normally fire when displaying the context after the process stops. But this makes a full-fledged feature out of it, as it will run every time we look up a file (if debuginfod is enabled, etc.). It seems fine to do this for the "source list" command (though it also may be nice to give the user an option to override this logic, just like he can specify a full path if he wants to), but doing it for stop-context purposes seems wrong, as there we should already have right module somewhere up the stack.

The second is the interaction between this and the target.source-map setting. For searching the file on the local filesystem, we want to use the remapped path, but in case of debuginfod, we would want to use the original path (ideally the one which doesn't even have the per-module mappings applied). The two of these things make me wonder if this new code is plugged in at the right level.

The last one is the test case. I've already said why I don't think this is a good test. Now I'll just add one more reason. With python it would be easy to create a function which handles the details of starting a fake debug info server. With lit, each new test for this (there are going to be more that one, I hope) will have to copy the // RUN: goo needed to start the server in a separate process. Sure, maybe you could do something similar here too and move that logic into a shell script, but then this will look even less like a "normal" lit test: a RUN line, which invokes a shell script, which invokes python in a background process... -- it would be much simpler (and portable) if it was python all the way.

lldb/source/Core/SourceManager.cpp
408

Yes, that does not sound right. It may be good to break this function into smaller pieces so you can invoke the thing you need when you need it.

lldb/source/Host/common/DebugInfoD.cpp
51

It would be nice to make a test case out of that.

This revision now requires changes to proceed.Wed, Mar 25, 1:45 AM

Greg originally designed the macOS equivalent of this, so I've added him.

I totally agree that you should only do wide searches for source files when there's no way to get a narrower context. Even "source list" could use the current thread & frame as a start context, and only fall back to a full search when that fails. In a big project with many shlibs, you might have multiple files of the same name, but if someone specifies foo.cpp while stopped in a method of libMyLib.dylib, they probably do mean the one in libMyLib.dylib, if it exists...

I'm not terribly happy with the way the module-level source-file remapping interacts with debug info. It overrides the source-map without a way to undo that. The dSYM's that Apple uses internally have module-level source remapping in them that point to some NFS mounted directory. We've had problems where somebody has copied over one of these dSYM's and the associated sources, but doesn't want to remote mount the directories that host the sources. The only way to point lldb at those is to either copy them over to a directory structure that matches the remote path, or edit the dSYM to remove the source remapping. That's pretty annoying.

On the other hand, I don't think we should show the build locations (or at least not as a primary thing) for source files that have come from debuginfod or from another module level remapping. That's confusing to anyone who wants to open the file in some other way (for instance if you wanted to hand the file off to an external editor.) If we a way to get at both pieces of information, the source info command could be used to show the "debug info path" and the "local path" for a given source file. We might want some API (on SBFileSpec?) to get the original path - that would actually be useful when trying to figure out what you should use for a target.source-map. But if we have a local copy of the file you need to be able to get to that path.

The test does seem like it would be much better as a Python test.

kwk marked 3 inline comments as done.Thu, Mar 26, 2:29 AM

@labath I made a signficant simplification of starting and killing the server. I hope you like that better.

lldb/source/Core/SourceManager.cpp
408

My intention wasn't to leave this as is to be honest. I had comments in here that I removed upon request but they existed to remind myself that I haven't double checked the logic well enough. I just wanted access to the symbol context further down below and thought, that I can take it from up here.

lldb/source/Host/common/DebugInfoD.cpp
51

I agree, a test would be nice but not at this stage, where the whole patch seems to be at danger.

lldb/test/Shell/SymbolFile/DWARF/source-list.cpp
58

@labath My bad. I interpreted timeout 5 wrongly. It will kill the python server after 5 seconds no matter what. If we increase this time to timeout 5m it will kill the server after 5 minutes and we don't need the bash trap. Does that sound better? At least the only ugly part would be done this way. The whole section would look like this:

// RUN: rm -f "%t.server.log"
// RUN: timeout 5m python3 -u -m http.server 0 --directory %t.mock --bind "localhost" &> %t.server.log &
kwk planned changes to this revision.Thu, Mar 26, 3:20 AM

Currently we have a solution for macOS to locate symbol files in the "lldb/source/Symbol/LocateSymbolFile.cpp" file in the Symbols::LocateExecutableSymbolFile(...) function:

static FileSpec Symbols::LocateExecutableSymbolFile(const ModuleSpec &module_spec, const FileSpecList &default_search_paths);

This will locate any files that are already on the system and return the symbol file. When you don't have a symbol file, we can call:

static bool Symbols::DownloadObjectAndSymbolFile(ModuleSpec &module_spec, bool force_lookup = true);

This might ping a build server and download the symbols.

As for source file remappings, as Jim stated, on mac, each dSYM has path remappings already inside of it that are applied on the Module (not a target wide setting) itself and no modifications need to be done to the SourceManager.

So my question is: can be use debuginfod to find the symbol file for a given build ID via Symbols::LocateExecutableSymbolFile(...), and when/if a symbol file is fetched from debuginfod, apply all path remappings to the module itself that we hand out? Then no changes would be needed in the SourceManager, we would just ask for a symbol file and get one back with all the remappings that are needed.

kwk updated this revision to Diff 253531.Mon, Mar 30, 2:32 AM
  • Use file:// and require debuginfod 0.179
  • simplify FindDebuginfod.cmake
kwk planned changes to this revision.Mon, Mar 30, 2:33 AM

Currently we have a solution for macOS to locate symbol files in the "lldb/source/Symbol/LocateSymbolFile.cpp" file in the Symbols::LocateExecutableSymbolFile(...) function:

static FileSpec Symbols::LocateExecutableSymbolFile(const ModuleSpec &module_spec, const FileSpecList &default_search_paths);

This will locate any files that are already on the system and return the symbol file. When you don't have a symbol file, we can call:

static bool Symbols::DownloadObjectAndSymbolFile(ModuleSpec &module_spec, bool force_lookup = true);

This might ping a build server and download the symbols.

As for source file remappings, as Jim stated, on mac, each dSYM has path remappings already inside of it that are applied on the Module (not a target wide setting) itself and no modifications need to be done to the SourceManager.

So my question is: can be use debuginfod to find the symbol file for a given build ID via Symbols::LocateExecutableSymbolFile(...), and when/if a symbol file is fetched from debuginfod, apply all path remappings to the module itself that we hand out? Then no changes would be needed in the SourceManager, we would just ask for a symbol file and get one back with all the remappings that are needed.

I've been thinking about that a lot too. The thing that's not clear to me is, does DownloadObjectAndSymbolFile download source files too? If so, how?

I am expecting that this feature will hook in very near to DownloadObjectAndSymbolFile for downloading the debug info, but it's not clear to me how would the source files fit in. Currently, debuginfod only provides an api to retrieve a single source file, so this code would have to parse all of the debug info, pry out the source files, and download them one by one -- a complicated and slow process.

Now if debuginfod provided an api to download all source files in a single request (*), that might be workable. However, in principle, I see nothing wrong with being able to download the files on demand, if we have the option to do that. (debuginfod's long term goal seems to be to provide an api to download the debug info in chunks too -- that would be very interesting, though I also expect it to be very complicated.)

(*) Though it seems very wasteful to download all files when we are going to need only a handful of them, it may not really be that way -- we're going to be downloading all of debug info anyway, and this is going to be much larger that all of source code put together.

fche2 added a comment.Mon, Mar 30, 4:44 AM

I am expecting that this feature will hook in very near to DownloadObjectAndSymbolFile for downloading the debug info, but it's not clear to me how would the source files fit in. Currently, debuginfod only provides an api to retrieve a single source file, so this code would have to parse all of the debug info, pry out the source files, and download them one by one -- a complicated and slow process.

Yeah, as debuginfod does not support a batch type of source download, maybe this particular lldb site is not an ideal fit..

(*) Though it seems very wasteful to download all files when we are going to need only a handful of them, it may not really be that way -- we're going to be downloading all of debug info anyway, and this is going to be much larger that all of source code put together.

I see your point, OTOH you only download the whole debuginfo because you currently have no choice. (Someday with debuginfod or such, you might be able to offload the DWARF searches, and then you won't have to download the whole thing.) We do have the choice to download sources on demand.

Currently we have a solution for macOS to locate symbol files in the "lldb/source/Symbol/LocateSymbolFile.cpp" file in the Symbols::LocateExecutableSymbolFile(...) function:

static FileSpec Symbols::LocateExecutableSymbolFile(const ModuleSpec &module_spec, const FileSpecList &default_search_paths);

This will locate any files that are already on the system and return the symbol file. When you don't have a symbol file, we can call:

static bool Symbols::DownloadObjectAndSymbolFile(ModuleSpec &module_spec, bool force_lookup = true);

This might ping a build server and download the symbols.

As for source file remappings, as Jim stated, on mac, each dSYM has path remappings already inside of it that are applied on the Module (not a target wide setting) itself and no modifications need to be done to the SourceManager.

So my question is: can be use debuginfod to find the symbol file for a given build ID via Symbols::LocateExecutableSymbolFile(...), and when/if a symbol file is fetched from debuginfod, apply all path remappings to the module itself that we hand out? Then no changes would be needed in the SourceManager, we would just ask for a symbol file and get one back with all the remappings that are needed.

I've been thinking about that a lot too. The thing that's not clear to me is, does DownloadObjectAndSymbolFile download source files too? If so, how?

We should probably make a new SymbolServer plug-in and convert the Apple version to compile and be installed only for Apple. The API for this should be something like:

class SymbolServer {
  /// Get a cached symbol file that is already present on this machine.
  /// This gets called by all LLDB during normal debugging to fetch 
  /// and cached symbol files. 
  virtual ModuleSP GetSymbolFile(ModuleSpec module_spec);

  /// Download a symbol file when requested by the user. 
  /// This only gets run in response to the use requesting the symbols, 
  /// not as part of the normal debug work flow
  virtual FileSpec DownloadSymbolFile(ModuleSpec module_spec);

  /// New function that allows individual access to source files when 
  /// they aren't available on disk.
  virtual FileSpec GetSourceFile(FileSpec file, ....)
};

Then debuginfod would fit right in there. The one thing that this interace doesn't cover is adding source remappings to modules, but it would be great if we can do this somehow with this new interface. Maybe SymbolServer::GetSymbolFile() can take a module_sp of the existing module so it can modify the source remappings if it has any?

I am expecting that this feature will hook in very near to DownloadObjectAndSymbolFile for downloading the debug info, but it's not clear to me how would the source files fit in. Currently, debuginfod only provides an api to retrieve a single source file, so this code would have to parse all of the debug info, pry out the source files, and download them one by one -- a complicated and slow process.

This would be taken care of by the SymbolServer plug-in described above. For the Apple version, it would download the symbol file and remap the paths as it already does and the SymbolServer::GetSourceFile(FileSpec file) would just return the FileSpec that was passed in since it already is an NFS mount path. For debuginfod it can grab the source file one by one.

One possibility is to apply a module level source remapping that is unique to the symbol file that is returned from SymbolServer::GetSymbolFile(). Maybe we prepend the UUID of the module to all paths in the debug info. Something like mapping:

"/..." to "/<UUID/..."

Then when we run into this kind of path, we know we need to call into the SymbolServer to resolve it (by possibly downloading it and caching it first).

Now if debuginfod provided an api to download all source files in a single request (*), that might be workable. However, in principle, I see nothing wrong with being able to download the files on demand, if we have the option to do that. (debuginfod's long term goal seems to be to provide an api to download the debug info in chunks too -- that would be very interesting, though I also expect it to be very complicated.)

Agreed, lazy is good. I don't see the need to download sources in chunks though.

So using SymbolServer with unique path mappings ("/..." to "/<UUID/...") would allow us to accomplish lazy file access. Another option would be to mark and FileSpec objects that are handed out by any symbol files retrieved from the symbol server as needing resolution in the SymbolServer. Later code could do something like:

if file_spec.PathNeedsSymbolServerResolution():

file_spec.ResolveWithSymbolServer(symbol_server);

And the path would update itself in the SymbolFile before it makes it out to any users. So stack frames that do lookups would end up resolving these paths before handing the information out to the user.

(*) Though it seems very wasteful to download all files when we are going to need only a handful of them, it may not really be that way -- we're going to be downloading all of debug info anyway, and this is going to be much larger that all of source code put together.

I think we should be able to figure out how to make this lazy with a new plug-in interface

I am expecting that this feature will hook in very near to DownloadObjectAndSymbolFile for downloading the debug info, but it's not clear to me how would the source files fit in. Currently, debuginfod only provides an api to retrieve a single source file, so this code would have to parse all of the debug info, pry out the source files, and download them one by one -- a complicated and slow process.

Yeah, as debuginfod does not support a batch type of source download, maybe this particular lldb site is not an ideal fit..

We can make it ideal. debuginfod has nice stuff in it and we should adapt LLDB for sure! See my SymbolServer plug-in interface in my previous comments and let me know what you think.

(*) Though it seems very wasteful to download all files when we are going to need only a handful of them, it may not really be that way -- we're going to be downloading all of debug info anyway, and this is going to be much larger that all of source code put together.

I see your point, OTOH you only download the whole debuginfo because you currently have no choice. (Someday with debuginfod or such, you might be able to offload the DWARF searches, and then you won't have to download the whole thing.) We do have the choice to download sources on demand.

We should be able to make this work lazily and not having to download all files.

Making a plugin out of this sounds like a good idea to me, and I could immediately find several downstream users for it. However, it seems to me there is a great deal of overlap between this SymbolServer thingy and the existing SymbolVendor plugin (I mean, "vend" and "serve" are basically synonyms in this context). The main difference is that SymbolVendor is responsible for just finding the symbol file (in case it is not in the main executable), where as this new thing could also be used for finding the main executable too (as well as the relevant source files).

I think it would be very confusing to have both symbol "vendors" and "servers" and we should try hard to implement that with a single interface. The SymbolVendor doesn't do much nowadays (it's basically just a single function that tries to search in various locations -- the rest is boilerplate). If we add more functionality to it, it might make it seem less baroque.

That might also help the path remapping situation. Since symbol vendors sort of sit in between the SymbolFile and Module classes, it should be possible to arrange things such that they see the raw paths coming out of the symbol file, before they are mangled by various mappings.

The main issue is that the symbol vendors currently are ELF, macOS and WASM. Right now we have one SymbolVendor for a triple, but I can see a SymbolVendor wanting to use multiple symbol servers to get information: one for the OS binaries (debuginfod or DebugSymbols.framework at Apple) and one for the current application with company specific symbol servers. At Apple, they can download any symbols for macOS, iOS, watchOS and tvOS OSes and applications. At Facebook we can download symbols for android, linux and iOS. Linux distros might have ways to download symbols for their OS stuff, which might work along side debuginfod? Also windows has the ability to download symbols.

So it might be good to have the SymbolVendors use one or more SymbolServer plug-ins.

Another idea for the SymbolServers: be able to specify a source repository (git, svn etc) and hash or revision ID. The symbol server can grab the source from the repo and cache is locally for display.

Another idea for the SymbolServers: be able to specify a source repository (git, svn etc) and hash or revision ID. The symbol server can grab the source from the repo and cache is locally for display.

When you talk about it FYI I use "build-id" to "GIT hash" mapping text file during build to retrieve source files later according to binary's build-id. But it expects you do not strip symbols from the binaries as otherwise one cannot rebuild the binaries later ("reproducible build" problem - dependency on versions of system packages being updated in the meantime). https://www.jankratochvil.net/t/BUILDID-git-checkout
debuginfod solves this better although with higher storage requirements.

labath added a comment.Wed, Apr 1, 3:09 AM

The main issue is that the symbol vendors currently are ELF, macOS and WASM. Right now we have one SymbolVendor for a triple, but I can see a SymbolVendor wanting to use multiple symbol servers to get information: one for the OS binaries (debuginfod or DebugSymbols.framework at Apple) and one for the current application with company specific symbol servers. At Apple, they can download any symbols for macOS, iOS, watchOS and tvOS OSes and applications. At Facebook we can download symbols for android, linux and iOS. Linux distros might have ways to download symbols for their OS stuff, which might work along side debuginfod? Also windows has the ability to download symbols.

So it might be good to have the SymbolVendors use one or more SymbolServer plug-ins.

I don't believe we have anything that would require all modules in a given target (or whatever) to use the same symbol vendor type. Each module gets its own instance of the object, which is obtained and manipulated through the generic interface. It is true that our current symbol vendors key off of the triple (more like object file type, really), so all modules ale likely to have the same vendor, but nothing really requires it to be that way. The symbol vendors get a ModuleSP, and they can use any information there to determine whether they are relevant. So if we had multiple symbol vendors interested in say elf files, we would just ask each of them in turn whether they can handle this module, and the first one would "win".

fche2 added a comment.Tue, Apr 7, 8:16 AM

So it might be good to have the SymbolVendors use one or more SymbolServer plug-ins.

I don't believe we have anything that would require all modules in a given target (or whatever) to use the same symbol vendor type. [...]

Just for clarity, is someone proposing to undertake such a rework of that infrastructure? It sounds like this is becoming a prerequisite for Konrad's patch, but if no one's actually doing it, that means Konrad's work is on hold indefinitely. Is that the intent?