Adds a fallback to use the debuginfod client library (386655) in findDebugBinary.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.
Details
- Reviewers
phosek labath dblaikie alexander-shaposhnikov jhenderson vitalybuka - Commits
- rG34491ca7291c: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
rG5bba0fe12b29: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
rGe2ad4f175602: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
rG02cc8d698c49: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
Diff Detail
Event Timeline
llvm/tools/llvm-symbolizer/CMakeLists.txt | ||
---|---|---|
13 | Something with llvm::getCachedOrDownloadDebuginfo(llvm::ArrayRef<unsigned char>) is needed |
llvm/tools/llvm-symbolizer/CMakeLists.txt | ||
---|---|---|
13 | Thanks for pointing this out. Do you know if it might be related to the use of typedef ArrayRef<uint8_t> BuildIDRef; typedef SmallVector<uint8_t, 10> BuildID; ? |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
265โ266 | Could we RAII this somehow, so that the explicit cleanup at the end isn't needed? Also, I below the typical user won't be using the debuginfod client, so maybe it should be lazily initialized? |
Implement RAII for HTTPClient::cleanup using static destructor.
Verified destructor is called using lldb.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
265โ266 | Good idea, I've RAII'd the cleanup! I would like to lazily initialize the client, but I'm not sure we can. There's a relevant discussion here about why we need to run curl_global_init at the beginning of main, when there is only one thread. |
llvm/tools/llvm-symbolizer/CMakeLists.txt | ||
---|---|---|
13 | I don't know |
Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure
llvm/tools/llvm-symbolizer/CMakeLists.txt | ||
---|---|---|
13 | I noticed that shortly after I committed this in 02cc8d69, Nico Weber kindly ported the gn build with 0f865dc6. I think this may be what's missing to fix the sanitizer buildbot and I've added the corresponding edit below. Next time I commit I'll watch this bot closely to see if it recurs. Thanks again for pointing out the failure! |
Ok! What does the buildbot do / use?
I followed the instructions to reproduce the build failure, but the build fails for other reasons. (logs)
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt | ||
---|---|---|
75 โ | (On Diff #393329) | we cannot do that |
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt | ||
---|---|---|
75 โ | (On Diff #393329) | Sorry, I'm just trying to understand what this does and I saw several commits added symbols to that list in the past. Can you give me some idea of what the general problem is? Do I need to update the buildbot configuration since I'm linking the symbolizer against a new library (Debuginfod)? Or did the sanitizer detect a bug in my code? |
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt | ||
---|---|---|
75 โ | (On Diff #393329) | I guess bot will fail on later step. sorry, that I didn't realized this before, but we just need to include Debuginfod into |
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt | ||
---|---|---|
75 โ | (On Diff #393329) | Excellent, thank you so much! I will revert this commit, apply this change, remove my edits to the global_symbols.txt, and then re-push. |
compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt | ||
---|---|---|
75 โ | (On Diff #393329) | Up to you, but I don't mind if you fix this in folloup patch. |
Remove global_symbols.txt edits, and add Debuginfod lib to compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh
You could have a tool that pulls symbols for a binary and all the shared libraries it links, right?
A pattern I've seen over the years is that since we're all very busy, as soon as there's an easy explanation for why something is slow, people will jump to that explanation if something is slow. After this change, if someone says "llvm-symbolizer took several seconds", people will just say "well it probably pulled stuff off the network" instead of actually investigating.
If the concern of pulling stuff from the network is separated into its own binary, it's impossible to get confused about this.
Thanks for the link! Looks like there wasn't a lot of discussion on the thread. 1 and 3 seem like great goals. As said on this review, I think it's better if for step 2 there was a dedicated binary to pull debug info from the web, and if llvm-symbolizer (and other llvm tools) didn't start making network requests, but instead just loaded the symbols downloaded by the new tool.
Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:
//lld/tools/lld:lld has an assert_no_deps entry: //llvm/lib/Debuginfod:Debuginfod which fails for the dependency path: //lld/tools/lld:lld -> //lld/COFF:COFF -> //llvm/lib/DebugInfo/Symbolize:Symbolize -> //llvm/lib/Debuginfod:Debuginfod
The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.
I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.
Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)
@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.
yes, see D112759
A pattern I've seen over the years is that since we're all very busy, as soon as there's an easy explanation for why something is slow, people will jump to that explanation if something is slow. After this change, if someone says "llvm-symbolizer took several seconds", people will just say "well it probably pulled stuff off the network" instead of actually investigating.
If the concern of pulling stuff from the network is separated into its own binary, it's impossible to get confused about this.
Hi Nico, I just added you as a reviewer of D115500, which changes the build so that LLVM builds without libcurl by default, even if libcurl is present. This way there is no confusion.
Thanks for all these comments and sorry for the late response! I will respond more thoroughly to the points above soon. To avoid confusion I wanted to quickly mention something.
I believe there is an error in the GN build configuration which causes lld's dependency on Debuginfod.
Specifically the file
llvm/utils/gn/secondary/lld/COFF/BUILD.gn
adds the dependency on DebugInfo/Symbolize, but this is not in
lld/COFF/CMakeLists.txt
Perhaps this is gn line is leftover from a previous dependency which was updated in cmake but never updated in gn?
Removing it seems to resolve the issue building lld with gn when both this diff and rG2586c23b are applied.
Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.
If the gn change can be committed/reviewed separately (before recommitting this change) then it probably should be.
Good idea, thanks for the recommendation!
I've taken a look and lld. There is no usage of the definitions in Symbolize.cpp. This is unsurprising as removing the lib dep in D115554 did not break the gn build.
There is one file, lld/COFF/SymbolTable.cpp that includes "llvm/DebugInfo/Symbolize.h". This is a red herring, lld not use any of the Symbolize.h declarations. It actually just needs llvm/DebugInfo/DIContext.h which is included as
Symbolize.h -> SymbolizableModule.h -> DIContext.h.
So probably that line should be replaced with #include "llvm/DebugInfo/DIContext.h". I'm happy to add this to D115554.
I agree, there could be some use cases for debuginfod in lld, though it's probably not as natural target as (say) lldb.
Now that it's clear that lld does not use Symbolize, and therefore lld does not use Debuginfod, I propose llvm-dev as more appropriate place to discuss which tools should (or should not) use debuginfod in the future.
IIRC this is the most recent reply to the debuginfod RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-October/153101.html
That would be a good place to follow up.
Thanks!
I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.
It's very scary to me that you say that. Maybe we should have a in-person (well, VC) discussion about this change? To me, separation of concerns makes it fairly obvious that compilers and linkers etc shouldn't rely on an internet connection to work. There should be a separate binary that handles downloading stuff that can be run well in advance (or, not run). Kitchen-sync binaries/libraries etc tend to be not great to work with.
D115554 looks fine to me, independent of this change. I don't think it helps this change here much since another tool might add an accidental (or even intentional) dependency on DebugInfo/Symbollize again, like lld did in the past.
D115500 also looks fine to me in isolation, but it doesn't help with the overall design here.
I fairly strongy feel that there should be a library for just reading this debuginfod cache that doesn't do web requests that all tools could link to without concerns, and a separate library for writing to it and doing network requests, and that should be linked to from very few binaries.
Happy to, if you like - my calendar's usually pretty open - either scheduled or impromptu is fine by me.
To me, separation of concerns makes it fairly obvious that compilers and linkers etc shouldn't rely on an internet connection to work. There should be a separate binary that handles downloading stuff that can be run well in advance (or, not run). Kitchen-sync binaries/libraries etc tend to be not great to work with.
D115554 looks fine to me, independent of this change. I don't think it helps this change here much since another tool might add an accidental (or even intentional) dependency on DebugInfo/Symbollize again, like lld did in the past.
D115500 also looks fine to me in isolation, but it doesn't help with the overall design here.
I fairly strongy feel that there should be a library for just reading this debuginfod cache that doesn't do web requests that all tools could link to without concerns, and a separate library for writing to it and doing network requests, and that should be linked to from very few binaries.
The thread had very few replies on it, the actual technical discussion seems to happen on this review.
@dblaikie, @phosek, @noajshu, @hans and I met (online) last month and talked about this for a bit.
The summary was the following action items:
- have Debuginfod have a callback and put http request in llvm-symbolizer, and make llvm-symbolizer instead of Debuginfod depend on libcurl (iirc @phosek offered to look at that)
- maybe have an llvm-symbolizer mode that doesn't do requests
- maybe have a way to explicitly request pre-downloads of debug info
Some notes about things mentioned during the meeting:
Concerts with the approach:
- separation of concerns is generally good; compiler shouldn't depend on network
- libcurl now makes it into the deps of llvm.so
- some linux distros link it dynamically, so clang now depends on it transitively
- consider llvm-symbolizer copied via adb to a device that has funky connectivity (easy to see this causing timeouts);
- bootstrapping
- parallelism, no write blocking if pre-read
- would download in parallel atm
These concerns don't apply in all scenarios:
- .o storage might networked anyways, so less a concern there
- might have to pull stuff for dlopen()'d shared libraries
- would be happy if libllvm.so didn't depend on libcurl
- care about llvm-symbolizer and lldb
- current setup: build system image on one machine, strip binaries, upload debug info to GCS, then runs tests based on image, and then at runtime downloads debug info on demand
- can't pull all debug info for all binaries, too much storage
- been using this model for 3 years now, seems to work fine
- current thing not debuginfod based
- is opt-in based on env vars, so is that really different from having your debug info on a fuse mount
- (someone might add cmake var)
Great, thanks!
Glad to see others had similar concerns at https://github.com/llvm/llvm-project/issues/52731 :)