Page MenuHomePhabricator

[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
ClosedPublic

Authored by noajshu on Nov 11 2021, 2:32 PM.

Details

Summary

Adds a fallback to use the debuginfod client library (386655) in findDebugBinary.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
noajshu updated this revision to Diff 392895.Dec 8 2021, 12:54 PM

Small fix to address buildbot failure.

vitalybuka added inline comments.
llvm/tools/llvm-symbolizer/CMakeLists.txt
13

Something with llvm::getCachedOrDownloadDebuginfo(llvm::ArrayRef<unsigned char>) is needed
https://lab.llvm.org/buildbot/#/builders/37/builds/9005

noajshu added inline comments.Dec 8 2021, 4:21 PM
llvm/tools/llvm-symbolizer/CMakeLists.txt
13

Thanks for pointing this out. Do you know if it might be related to the use of

typedef ArrayRef<uint8_t> BuildIDRef;
typedef SmallVector<uint8_t, 10> BuildID;

?

noajshu updated this revision to Diff 392972.Dec 8 2021, 4:22 PM

Re-upload to run builders again.

noajshu updated this revision to Diff 393010.Dec 8 2021, 6:55 PM

Fix typo in Expected takeError idiom

jhenderson added inline comments.Dec 9 2021, 2:38 AM
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
265–266

Could we RAII this somehow, so that the explicit cleanup at the end isn't needed? Also, I below the typical user won't be using the debuginfod client, so maybe it should be lazily initialized?

noajshu updated this revision to Diff 393237.Dec 9 2021, 11:52 AM

Implement RAII for HTTPClient::cleanup using static destructor.
Verified destructor is called using lldb.

noajshu marked an inline comment as done.Dec 9 2021, 12:05 PM
noajshu added inline comments.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
265–266

Good idea, I've RAII'd the cleanup!

I would like to lazily initialize the client, but I'm not sure we can. There's a relevant discussion here about why we need to run curl_global_init at the beginning of main, when there is only one thread.
Although we could move the HTTPClient::init to one of llvm-symbolizer's functions, I'm concerned this function could get called from another program from within a thread. By calling at the beginning of main() we don't have to worry about that.

vitalybuka added inline comments.Dec 9 2021, 12:10 PM
llvm/tools/llvm-symbolizer/CMakeLists.txt
13

I don't know

noajshu updated this revision to Diff 393271.Dec 9 2021, 1:29 PM
noajshu marked an inline comment as done.

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

noajshu added inline comments.Dec 9 2021, 1:33 PM
llvm/tools/llvm-symbolizer/CMakeLists.txt
13

I noticed that shortly after I committed this in 02cc8d69, Nico Weber kindly ported the gn build with 0f865dc6. I think this may be what's missing to fix the sanitizer buildbot and I've added the corresponding edit below. Next time I commit I'll watch this bot closely to see if it recurs. Thanks again for pointing out the failure!

noajshu marked 2 inline comments as done.Dec 9 2021, 1:34 PM

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

That buldbot does not use .gn

noajshu added a comment.EditedDec 9 2021, 3:30 PM

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

That buldbot does not use .gn

Ok! What does the buildbot do / use?
I followed the instructions to reproduce the build failure, but the build fails for other reasons. (logs)

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

That buldbot does not use .gn

Ok! What does the buildbot do / use?
I followed the instructions to reproduce the build failure, but the build fails for other reasons. (logs)

Ah, I think I see the problem.

noajshu updated this revision to Diff 393329.Dec 9 2021, 4:21 PM

Add symbolizer symbols to global_symbols.txt.

noajshu edited the summary of this revision. (Show Details)Dec 9 2021, 4:21 PM
This revision was landed with ongoing or failed builds.Dec 9 2021, 4:25 PM
vitalybuka added inline comments.Dec 9 2021, 4:35 PM
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗(On Diff #393329)

we cannot do that
all allowed symbols are essentially glibc or sanitizer itself
but nothing will resolve these symbols

noajshu added inline comments.Dec 9 2021, 4:42 PM
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗(On Diff #393329)

Sorry, I'm just trying to understand what this does and I saw several commits added symbols to that list in the past.

Can you give me some idea of what the general problem is? Do I need to update the buildbot configuration since I'm linking the symbolizer against a new library (Debuginfod)? Or did the sanitizer detect a bug in my code?

vitalybuka added inline comments.Dec 9 2021, 4:53 PM
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗(On Diff #393329)

I guess bot will fail on later step.
global_symbols.txt it the list of unresolved symbols in this lib and we expect no llvm stuff there
sanitized user apps may link this lib and without need for any llvm stuff

sorry, that I didn't realized this before, but we just need to include Debuginfod into
llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh

noajshu added inline comments.Dec 9 2021, 4:58 PM
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗(On Diff #393329)

Excellent, thank you so much! I will revert this commit, apply this change, remove my edits to the global_symbols.txt, and then re-push.

vitalybuka added inline comments.Dec 9 2021, 5:01 PM
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗(On Diff #393329)

Up to you, but I don't mind if you fix this in folloup patch.
Thank you!

noajshu updated this revision to Diff 393339.Dec 9 2021, 5:05 PM
noajshu added a reviewer: vitalybuka.

Remove global_symbols.txt edits, and add Debuginfod lib to compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh

noajshu edited the summary of this revision. (Show Details)Dec 9 2021, 5:11 PM
vitalybuka reopened this revision.Dec 9 2021, 5:19 PM

to accept

This revision is now accepted and ready to land.Dec 9 2021, 5:19 PM
vitalybuka accepted this revision.Dec 9 2021, 5:21 PM

Thanks, compiler_rt LGTM.

This revision was landed with ongoing or failed builds.Dec 9 2021, 5:39 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2021, 5:39 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

@thakis Thanks for the suggestion. Although we may know the --objs which are going to be fed to llvm-symbolizer in advance, so we could run a "pull symbols" step just for those, we wouldn't know in advance which shared libraries the input (e.g. backtrace) will touch. So I think to make it work you'd need three steps -- symbolize and figure out what debuginfo is missing, then run the debuginfod fetch to download, and then symbolize again.

You could have a tool that pulls symbols for a binary and all the shared libraries it links, right?

A pattern I've seen over the years is that since we're all very busy, as soon as there's an easy explanation for why something is slow, people will jump to that explanation if something is slow. After this change, if someone says "llvm-symbolizer took several seconds", people will just say "well it probably pulled stuff off the network" instead of actually investigating.

If the concern of pulling stuff from the network is separated into its own binary, it's impossible to get confused about this.

Here is the relevant llvm-dev thread where we planned these changes.

Thanks for the link! Looks like there wasn't a lot of discussion on the thread. 1 and 3 seem like great goals. As said on this review, I think it's better if for step 2 there was a dedicated binary to pull debug info from the web, and if llvm-symbolizer (and other llvm tools) didn't start making network requests, but instead just loaded the symbols downloaded by the new tool.

Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:

//lld/tools/lld:lld has an assert_no_deps entry:
  //llvm/lib/Debuginfod:Debuginfod
which fails for the dependency path:
  //lld/tools/lld:lld ->
  //lld/COFF:COFF ->
  //llvm/lib/DebugInfo/Symbolize:Symbolize ->
  //llvm/lib/Debuginfod:Debuginfod

The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.

hans added a subscriber: hans.Dec 10 2021, 8:39 AM

Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:

//lld/tools/lld:lld has an assert_no_deps entry:
  //llvm/lib/Debuginfod:Debuginfod
which fails for the dependency path:
  //lld/tools/lld:lld ->
  //lld/COFF:COFF ->
  //llvm/lib/DebugInfo/Symbolize:Symbolize ->
  //llvm/lib/Debuginfod:Debuginfod

The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)

@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.

@thakis Thanks for the suggestion. Although we may know the --objs which are going to be fed to llvm-symbolizer in advance, so we could run a "pull symbols" step just for those, we wouldn't know in advance which shared libraries the input (e.g. backtrace) will touch. So I think to make it work you'd need three steps -- symbolize and figure out what debuginfo is missing, then run the debuginfod fetch to download, and then symbolize again.

You could have a tool that pulls symbols for a binary and all the shared libraries it links, right?

yes, see D112759

A pattern I've seen over the years is that since we're all very busy, as soon as there's an easy explanation for why something is slow, people will jump to that explanation if something is slow. After this change, if someone says "llvm-symbolizer took several seconds", people will just say "well it probably pulled stuff off the network" instead of actually investigating.

If the concern of pulling stuff from the network is separated into its own binary, it's impossible to get confused about this.

Hi Nico, I just added you as a reviewer of D115500, which changes the build so that LLVM builds without libcurl by default, even if libcurl is present. This way there is no confusion.

Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:

//lld/tools/lld:lld has an assert_no_deps entry:
  //llvm/lib/Debuginfod:Debuginfod
which fails for the dependency path:
  //lld/tools/lld:lld ->
  //lld/COFF:COFF ->
  //llvm/lib/DebugInfo/Symbolize:Symbolize ->
  //llvm/lib/Debuginfod:Debuginfod

The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)

@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.

Thanks for all these comments and sorry for the late response! I will respond more thoroughly to the points above soon. To avoid confusion I wanted to quickly mention something.
I believe there is an error in the GN build configuration which causes lld's dependency on Debuginfod.
Specifically the file
llvm/utils/gn/secondary/lld/COFF/BUILD.gn
adds the dependency on DebugInfo/Symbolize, but this is not in
lld/COFF/CMakeLists.txt
Perhaps this is gn line is leftover from a previous dependency which was updated in cmake but never updated in gn?
Removing it seems to resolve the issue building lld with gn when both this diff and rG2586c23b are applied.

noajshu updated this revision to Diff 393572.Dec 10 2021, 12:47 PM
noajshu edited the summary of this revision. (Show Details)

Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.

Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.

If the gn change can be committed/reviewed separately (before recommitting this change) then it probably should be.

Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.

If the gn change can be committed/reviewed separately (before recommitting this change) then it probably should be.

Sounds good! Here it is: D115554.

@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.

Good idea, thanks for the recommendation!
I've taken a look and lld. There is no usage of the definitions in Symbolize.cpp. This is unsurprising as removing the lib dep in D115554 did not break the gn build.
There is one file, lld/COFF/SymbolTable.cpp that includes "llvm/DebugInfo/Symbolize.h". This is a red herring, lld not use any of the Symbolize.h declarations. It actually just needs llvm/DebugInfo/DIContext.h which is included as
Symbolize.h -> SymbolizableModule.h -> DIContext.h.
So probably that line should be replaced with #include "llvm/DebugInfo/DIContext.h". I'm happy to add this to D115554.

Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)

I agree, there could be some use cases for debuginfod in lld, though it's probably not as natural target as (say) lldb.
Now that it's clear that lld does not use Symbolize, and therefore lld does not use Debuginfod, I propose llvm-dev as more appropriate place to discuss which tools should (or should not) use debuginfod in the future.
IIRC this is the most recent reply to the debuginfod RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-October/153101.html
That would be a good place to follow up.
Thanks!

noajshu updated this revision to Diff 393606.Dec 10 2021, 3:34 PM

Remove lld gn build change, as it has been moved to D115554 at @dblaikie 's suggestion.

noajshu edited the summary of this revision. (Show Details)Dec 10 2021, 3:34 PM

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

It's very scary to me that you say that. Maybe we should have a in-person (well, VC) discussion about this change? To me, separation of concerns makes it fairly obvious that compilers and linkers etc shouldn't rely on an internet connection to work. There should be a separate binary that handles downloading stuff that can be run well in advance (or, not run). Kitchen-sync binaries/libraries etc tend to be not great to work with.

D115554 looks fine to me, independent of this change. I don't think it helps this change here much since another tool might add an accidental (or even intentional) dependency on DebugInfo/Symbollize again, like lld did in the past.

D115500 also looks fine to me in isolation, but it doesn't help with the overall design here.

I fairly strongy feel that there should be a library for just reading this debuginfod cache that doesn't do web requests that all tools could link to without concerns, and a separate library for writing to it and doing network requests, and that should be linked to from very few binaries.

This revision was landed with ongoing or failed builds.Dec 13 2021, 3:04 PM

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

It's very scary to me that you say that. Maybe we should have a in-person (well, VC) discussion about this change?

Happy to, if you like - my calendar's usually pretty open - either scheduled or impromptu is fine by me.

To me, separation of concerns makes it fairly obvious that compilers and linkers etc shouldn't rely on an internet connection to work. There should be a separate binary that handles downloading stuff that can be run well in advance (or, not run). Kitchen-sync binaries/libraries etc tend to be not great to work with.

D115554 looks fine to me, independent of this change. I don't think it helps this change here much since another tool might add an accidental (or even intentional) dependency on DebugInfo/Symbollize again, like lld did in the past.

D115500 also looks fine to me in isolation, but it doesn't help with the overall design here.

I fairly strongy feel that there should be a library for just reading this debuginfod cache that doesn't do web requests that all tools could link to without concerns, and a separate library for writing to it and doing network requests, and that should be linked to from very few binaries.

noajshu, I'm confused why this relanded with ongoing open discussion.

noajshu, I'm confused why this relanded with ongoing open discussion.

Hi @thakis , we achieved consensus on this approach with an RFC to llvm-dev. Since you're proposing to remove debuginfod could you please follow up with your concerns on llvm-dev?
Thanks!

noajshu, I'm confused why this relanded with ongoing open discussion.

Hi @thakis , we achieved consensus on this approach with an RFC to llvm-dev. Since you're proposing to remove debuginfod could you please follow up with your concerns on llvm-dev?
Thanks!

The thread had very few replies on it, the actual technical discussion seems to happen on this review.

thakis added a comment.Feb 3 2022, 3:38 PM

@dblaikie, @phosek, @noajshu, @hans and I met (online) last month and talked about this for a bit.

The summary was the following action items:

  • have Debuginfod have a callback and put http request in llvm-symbolizer, and make llvm-symbolizer instead of Debuginfod depend on libcurl (iirc @phosek offered to look at that)
  • maybe have an llvm-symbolizer mode that doesn't do requests
  • maybe have a way to explicitly request pre-downloads of debug info

Some notes about things mentioned during the meeting:

Concerts with the approach:

  • separation of concerns is generally good; compiler shouldn't depend on network
  • libcurl now makes it into the deps of llvm.so
      • some linux distros link it dynamically, so clang now depends on it transitively
    • consider llvm-symbolizer copied via adb to a device that has funky connectivity (easy to see this causing timeouts);
  • bootstrapping
  • parallelism, no write blocking if pre-read
    • would download in parallel atm

These concerns don't apply in all scenarios:

  • .o storage might networked anyways, so less a concern there
  • might have to pull stuff for dlopen()'d shared libraries
  • would be happy if libllvm.so didn't depend on libcurl
  • care about llvm-symbolizer and lldb
  • current setup: build system image on one machine, strip binaries, upload debug info to GCS, then runs tests based on image, and then at runtime downloads debug info on demand
    • can't pull all debug info for all binaries, too much storage
  • been using this model for 3 years now, seems to work fine
    • current thing not debuginfod based
  • is opt-in based on env vars, so is that really different from having your debug info on a fuse mount
  • (someone might add cmake var)

@dblaikie, @phosek, @noajshu, @hans and I met (online) last month and talked about this for a bit.

The summary was the following action items:

  • have Debuginfod have a callback and put http request in llvm-symbolizer, and make llvm-symbolizer instead of Debuginfod depend on libcurl (iirc @phosek offered to look at that)
  • maybe have an llvm-symbolizer mode that doesn't do requests

This is being implemented by @mysterymath as D118413 and D118665.

thakis added a comment.Feb 8 2022, 5:15 PM

Great, thanks!

Glad to see others had similar concerns at https://github.com/llvm/llvm-project/issues/52731 :)