Download Raw Diff

Details

Reviewers

phosek
labath
dblaikie
alexander-shaposhnikov
jhenderson
vitalybuka

Commits

rG34491ca7291c: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
rG5bba0fe12b29: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
rGe2ad4f175602: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
rG02cc8d698c49: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.

Summary

Adds a fallback to use the debuginfod client library (386655) in findDebugBinary.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a subscriber: llvm-commits. · View Herald TranscriptNov 11 2021, 2:33 PM

noajshu added a parent revision: D112758: [llvm] [Debuginfo] Debuginfod client library..Nov 11 2021, 2:33 PM

noajshu added reviewers: phosek, labath.

noajshu mentioned this in D112758: [llvm] [Debuginfo] Debuginfod client library..Nov 11 2021, 2:56 PM

Harbormaster completed remote builds in B133826: Diff 386669.Nov 11 2021, 3:21 PM

phosek added a reviewer: dblaikie.Nov 18 2021, 2:41 PM

Nice and simple - though would be good to have test coverage.

Remove unnecessary buildIDToString call due to updates to D112758.

Harbormaster completed remote builds in B135233: Diff 388658.Nov 19 2021, 4:44 PM

Add more help info for client tool and update lit tests.

Herald added a subscriber: ormris. · View Herald TranscriptNov 28 2021, 9:01 AM

Undo accidental diff update (sorry)

noajshu added a child revision: D112759: [llvm] [Debuginfo] Add llvm-debuginfod-find tool and end-to-end-tests..Nov 28 2021, 9:30 AM

Add lit end-to-end tests of symbolizer debuginfod client support.

Herald added a reviewer: alexander-shaposhnikov. · View Herald TranscriptNov 28 2021, 9:57 AM

Herald added a reviewer: jhenderson. · View Herald Transcript

Herald added subscribers: rupprecht, MaskRay. · View Herald Transcript

@dblaikie thanks for your suggestion to add tests. I just added regression test coverage checking that the symbolizer can query a local debuginfod server (using a python simple http server).
I also added unit tests to D112758 checking that the local cache is used correctly.
I made one small change to the client library, which is to respect the DEBUGINFOD_CACHE_PATH and DEBUGINFOD_TIMEOUT env variables in addition to DEBUGINFOD_URLS.
Please let me know if you have thoughts on the testing strategy or any other feedback on this diff stack (D112751, D112753, D112758, and D113717).
If this looks good I can update D112759 to use this same python test script.
Thanks!

Remove extraneous --keep-section=.notes in symbolizer debuginfod test.

Harbormaster completed remote builds in B136345: Diff 390215.Nov 28 2021, 10:44 AM

Not looked at anything else in the patch stack, but I can more-or-less guess what is going on (storing the debug data on a remote server for the symbolizer and other tools to be able to look up?). If so, in general, this looks fine to me. My only slight concern is I wonder how many places (especially build bots) will exercise this test, since it requires a new dependency. Might be worth approaching some build bot owners and asking if they'd be willing to install curl as part of their setup.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
390–393	It would be good if this failure path could be tested, unless this can be just said to be tested with the other generic symbolizer tests?
llvm/test/tools/llvm-symbolizer/debuginfod.py
2 ↗	(On Diff #390215)	Not sure I follow which source you're referring to here...
8 ↗	(On Diff #390215)
17 ↗	(On Diff #390215)	What's this being used for?
23 ↗	(On Diff #390215)
25 ↗	(On Diff #390215)	I'm pretty sure this explicit return type isn't necessary. Also, should it be `server_path`? (Also applies below)
39 ↗	(On Diff #390215)	Perhaps worth including the actual code in this message, to make test debugging easier.
47 ↗	(On Diff #390215)	Any particular reason not to put this at top-level like the other imports?
49–51 ↗	(On Diff #390215)	It would help the RUN line above be more self-descriptive (and flexible for the future) if these were dashed arguments, rather than positionals.
55 ↗	(On Diff #390215)	`sys.exit` is the normal way to exit with an exit code.
llvm/test/tools/llvm-symbolizer/lit.local.cfg
1 ↗	(On Diff #390215)	You could avoid this by just renaming the test file to have a `.test` suffix.

In D113717#3157634, @jhenderson wrote:

Not looked at anything else in the patch stack, but I can more-or-less guess what is going on (storing the debug data on a remote server for the symbolizer and other tools to be able to look up?). If so, in general, this looks fine to me. My only slight concern is I wonder how many places (especially build bots) will exercise this test, since it requires a new dependency. Might be worth approaching some build bot owners and asking if they'd be willing to install curl as part of their setup.

I seem to recall some mention that this API was usable with a local cache even without the curl dependency? Perhaps a test using only a local cache (not needing a server or curl) could be added as well? (the test with curl/etc might be good to keep too - though perhaps it could be removed if the API functionality is already tested elsewhere - if it isn't/can't readily be tested elsewhere then here's fine too)

Add end-to-end test of debuginfod client cache (curl not required). Updates to debuginfod.py testing helper script.

Thank you for the helpful comments @jhenderson , I just uploaded a change to address them.

In D113717#3159258, @dblaikie wrote:

In D113717#3157634, @jhenderson wrote:

Not looked at anything else in the patch stack, but I can more-or-less guess what is going on (storing the debug data on a remote server for the symbolizer and other tools to be able to look up?). If so, in general, this looks fine to me. My only slight concern is I wonder how many places (especially build bots) will exercise this test, since it requires a new dependency. Might be worth approaching some build bot owners and asking if they'd be willing to install curl as part of their setup.

I seem to recall some mention that this API was usable with a local cache even without the curl dependency? Perhaps a test using only a local cache (not needing a server or curl) could be added as well? (the test with curl/etc might be good to keep too - though perhaps it could be removed if the API functionality is already tested elsewhere - if it isn't/can't readily be tested elsewhere then here's fine too)

D112758 includes unit testing that verifies the local cache works without curl. I updated the end-to-end tests of this diff (D113717) and added an additional test that just hits the local cache and does not require Curl. I also added checks for the Curl-dependent test to verify that even after the server is taken down, the debuginfo is still accessible in the local cache.
Is it ok to keep all of these tests? If I had to pick between the end-to-end cache test here and the unit test in D112758, I would favor the unit test since it's more likely to be run on all bots. However the end-to-end test here is additionally checking that the debuginfod client has been properly integrated with the symbolizer.

I think it's a great idea to make sure libcurl will be tested on some buildbots. Since LLVM_ENABLE_CURL defaults to ON in D112753, llvm will be built with curl by default as long as it's found by cmake. It's possible the library is already installed on some buildbots. Do you know how to view the buildbot configuration scripts?

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
390–393	I agree. I've just added additional lit tests which hit this failure path.
llvm/test/tools/llvm-symbolizer/debuginfod.py
8 ↗	(On Diff #390215)	Thanks, I just split this script into `debuginfod.test` and `debuginfod.py` to make it a bit easier to read. If there's a preference towards a single file I can instead merge them back together and use `%s`.
17 ↗	(On Diff #390215)	I just added a comment at the top of this script to explain its purpose. The short version is we want to run an end-to-end test of the debuginfod client functionality of the symbolizer, and so we need a working debuginfod server running at the same time.
49–51 ↗	(On Diff #390215)	Good idea, changed this!

noajshu mentioned this in D114415: [llvm] [Debuginfod] Add HTTP Server to Debuginfod library..Nov 30 2021, 3:22 PM

Harbormaster completed remote builds in B136779: Diff 390826.Nov 30 2021, 3:46 PM

In D113717#3162690, @noajshu wrote:

Thank you for the helpful comments @jhenderson , I just uploaded a change to address them.

In D113717#3159258, @dblaikie wrote:

In D113717#3157634, @jhenderson wrote:

Not looked at anything else in the patch stack, but I can more-or-less guess what is going on (storing the debug data on a remote server for the symbolizer and other tools to be able to look up?). If so, in general, this looks fine to me. My only slight concern is I wonder how many places (especially build bots) will exercise this test, since it requires a new dependency. Might be worth approaching some build bot owners and asking if they'd be willing to install curl as part of their setup.

I seem to recall some mention that this API was usable with a local cache even without the curl dependency? Perhaps a test using only a local cache (not needing a server or curl) could be added as well? (the test with curl/etc might be good to keep too - though perhaps it could be removed if the API functionality is already tested elsewhere - if it isn't/can't readily be tested elsewhere then here's fine too)

D112758 includes unit testing that verifies the local cache works without curl. I updated the end-to-end tests of this diff (D113717) and added an additional test that just hits the local cache and does not require Curl. I also added checks for the Curl-dependent test to verify that even after the server is taken down, the debuginfo is still accessible in the local cache.
Is it ok to keep all of these tests? If I had to pick between the end-to-end cache test here and the unit test in D112758, I would favor the unit test since it's more likely to be run on all bots. However the end-to-end test here is additionally checking that the debuginfod client has been properly integrated with the symbolizer.

I'd probably be inclined to skip the testing that requires this standalone python server - and only test llvm-symbolizer via the non-curl-dependent local cache. The API surface area seems small enough that it's pretty unlikely that the unit tests for networked cases could pass, local cache testing of llvm-symbolizer could pass, but then networked llvm-symbolizer functionality would still be broken?

Oh... but the unit testing only tests the local cache - so this llvm-symbolizer test and python server is the only testing of the remote part? It'd be nice if there is lower level and higher level testing, that the lower level testing should be more comprehensive - if it'd be feasible/reasonable to test the API directly with networking functionality?

I think it's a great idea to make sure libcurl will be tested on some buildbots. Since LLVM_ENABLE_CURL defaults to ON in D112753, llvm will be built with curl by default as long as it's found by cmake. It's possible the library is already installed on some buildbots. Do you know how to view the buildbot configuration scripts?

Not sure, really :/ the buildbots are a bit haphazard.

llvm/test/tools/llvm-symbolizer/debuginfod.py
1 ↗	(On Diff #390826)	Any file that isn't itself a test file (with RUN lines, etc) even if it isn't currently found by the lit config as a test, should go in an "Inputs" subdirectory.

In D113717#3162690, @noajshu wrote:

I think it's a great idea to make sure libcurl will be tested on some buildbots. Since LLVM_ENABLE_CURL defaults to ON in D112753, llvm will be built with curl by default as long as it's found by cmake. It's possible the library is already installed on some buildbots. Do you know how to view the buildbot configuration scripts?

I'm afraid I don't know either. Perhaps ask on the llvm-dev mailing list?

llvm/test/tools/llvm-symbolizer/debuginfod-cache.test
5 ↗	(On Diff #390826)	Does this string need to be this specific value? Might be worth a comment if it does.
10–11 ↗	(On Diff #390826)	Use `-NEXT` suffixes here and below.
14 ↗	(On Diff #390826)	Similar to my above comment: add a comment why this path is what it is.
llvm/test/tools/llvm-symbolizer/debuginfod.py
2 ↗	(On Diff #390826)
8 ↗	(On Diff #390215)	I personally am a big fan of keeping support files that are used in only a single test within the same file, as it means if the test is ever deleted/renamed, the support files aren't left lying around with no users. It also is easier to follow what the script is doing: by having them separate, it requires a developer to open up another file to see what a test is actually doing. Note: An exception to the above is if a script is soon going to be used by other tests that don't yet exist.
17 ↗	(On Diff #390215)	I was referring specificaly to the `import functools` which I didn't immediately see a usage of.
llvm/test/tools/llvm-symbolizer/debuginfod.test
3	If you're referring to `addr.exe` - that's not a program specific to one particular test: it's used by a fair number of tests (or at least used to be!). I'd therefore delete this comment, since it is misleading. It would be useful to put a high-level comment explaining what this test is testing, so that the reason --strip-debug is used is explained somewhere.
7	Similar to my earlier comment, it would be useful to explain why this string is what it is.
15
16–17
18–19	Similar to above: explain the need for the llvm-objcopy command.

noajshu mentioned this in D112759: [llvm] [Debuginfo] Add llvm-debuginfod-find tool and end-to-end-tests..Dec 5 2021, 7:36 PM

Simplify testing.

Thanks for your suggestions on the end-to-end tests @jhenderson. Following a discussion here about testing strategy, I think we only need to test the integration point in the symbolizer here, which we achieve with the simpler test that just hits the cache. I have removed the more complicated python HTTP server test from this diff and moved it to D112759 where it was adapted to test the llvm-debuginfod-find client instead and incorporate your feedback to make it more self-documenting.

Without the bulkier test this diff is nice and small. How does it look to you?

Explain the need for the llvm-objcopy command.

Harbormaster completed remote builds in B137589: Diff 391958.Dec 5 2021, 8:37 PM

jhenderson added inline comments.Dec 6 2021, 1:08 AM

llvm/test/tools/llvm-symbolizer/debuginfod.test
11	I feel like runtime environment variables are usually prefixed with `env`. I suspect there's a good reason for this (maybe something Windows-specific?), so you probably need that too here, at a guess.
24
26	You don't use the `--print-address` option here.

Prefix environment variables in lit test with env, and remove extraneous flag to llvm-symbolizer in test.

noajshu marked 2 inline comments as done.Dec 6 2021, 1:23 PM

Harbormaster completed remote builds in B137731: Diff 392161.Dec 6 2021, 1:34 PM

LGTM! Thanks for the work!

This revision is now accepted and ready to land.Dec 7 2021, 1:06 AM

In D113717#3175664, @jhenderson wrote:

LGTM! Thanks for the work!

Thanks for the helpful comments!

Due to D115131, which removed HTTPClient::initialize from InitLLVM, we need to initialize the client at the beginning of main() in llvm-symbolizer.cpp.

Harbormaster completed remote builds in B137920: Diff 392415.Dec 7 2021, 9:22 AM

Closed by commit rG02cc8d698c49: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer. (authored by noajshu). · Explain WhyDec 8 2021, 10:10 AM

This revision was automatically updated to reflect the committed changes.

noajshu added a commit: rG02cc8d698c49: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer..

noajshu added a reverting change: rGaaec63d2a7db: Revert "[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.".Dec 8 2021, 10:50 AM

Hm, I'd prefer if llvm-symbolizer (and almost all llvm tools) didn't make network requests. Could we have a new tool for this instead, so that the dep on Debuginfod could be in the tool instead of in llvm/lib/DebugInfo/Symbolize?

Thinking out loud a bit, maybe there should be a lib/DebuginfoCacheReader that can just read the debuginfod cache but not write on it, and then ~everything could depend on that without problems, and lib/Debuginfod would depend on that but could also write to the cache (and it could depend on libcurl and make http requests etc)? Then in a bot setup, the bot could call the "pull symbols" thing as a dedicated step at the start. And for devs, maybe there could be an llvm-debuginfod binary that can pull symbols as part of symbolizing.

(My main concern is that the current design feels that it's very easy to add a dep on an http library to arbitrary llvm tools, and most toolchain tools shouldn't make http requests.)

To expand a bit, for most tools it doesn't seem desirable to me if they can block randomly and need 1000x as long depending on if they need to download something or not.

It seems fine if the debugger makes http requests, or spawns a subprocess that makes http requests. But for most tools it doesn't seem desirable if they can block randomly and need 1000x as long depending on if they need to download something or not. It seems nicer to run the "get symbols" thing once and have all tools work consistently fast and without network access then.

@thakis Thanks for the suggestion. Although we may know the --objs which are going to be fed to llvm-symbolizer in advance, so we could run a "pull symbols" step just for those, we wouldn't know in advance which shared libraries the input (e.g. backtrace) will touch. So I think to make it work you'd need three steps -- symbolize and figure out what debuginfo is missing, then run the debuginfod fetch to download, and then symbolize again.

By the way, network requests are made only when built with LLVM_ENABLE_CURL and then only to those urls in the DEBUGINFOD_URLS environment variable. There is also a timeout controllable by DEBUGINFOD_TIMEOUT which sets an upper limit on the number of milliseconds before a fetch operation gives up.

Here is the relevant llvm-dev thread where we planned these changes.

Small fix to address buildbot failure.

vitalybuka added a subscriber: vitalybuka.Dec 8 2021, 12:59 PM

vitalybuka added inline comments.

llvm/tools/llvm-symbolizer/CMakeLists.txt
13 ↗	(On Diff #392895)	Something with llvm::getCachedOrDownloadDebuginfo(llvm::ArrayRef<unsigned char>) is needed https://lab.llvm.org/buildbot/#/builders/37/builds/9005

Harbormaster completed remote builds in B138253: Diff 392895.Dec 8 2021, 3:39 PM

noajshu added inline comments.Dec 8 2021, 4:21 PM

llvm/tools/llvm-symbolizer/CMakeLists.txt
13 ↗	(On Diff #392895)	Thanks for pointing this out. Do you know if it might be related to the use of typedef ArrayRef<uint8_t> BuildIDRef; typedef SmallVector<uint8_t, 10> BuildID; ?

Re-upload to run builders again.

Fix typo in Expected takeError idiom

Harbormaster completed remote builds in B138344: Diff 393010.Dec 8 2021, 10:25 PM

jhenderson added inline comments.Dec 9 2021, 2:38 AM

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
265–266 ↗	(On Diff #393010)	Could we RAII this somehow, so that the explicit cleanup at the end isn't needed? Also, I below the typical user won't be using the debuginfod client, so maybe it should be lazily initialized?

Implement RAII for HTTPClient::cleanup using static destructor.
Verified destructor is called using lldb.

noajshu marked an inline comment as done.Dec 9 2021, 12:05 PM

noajshu added inline comments.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
265–266 ↗	(On Diff #393010)	Good idea, I've RAII'd the cleanup! I would like to lazily initialize the client, but I'm not sure we can. There's a relevant discussion here about why we need to run `curl_global_init` at the beginning of main, when there is only one thread. Although we could move the `HTTPClient::init` to one of llvm-symbolizer's functions, I'm concerned this function could get called from another program from within a thread. By calling at the beginning of main() we don't have to worry about that.

vitalybuka added inline comments.Dec 9 2021, 12:10 PM

llvm/tools/llvm-symbolizer/CMakeLists.txt
13 ↗	(On Diff #392895)	I don't know

Harbormaster completed remote builds in B138501: Diff 393237.Dec 9 2021, 12:43 PM

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

noajshu added inline comments.Dec 9 2021, 1:33 PM

llvm/tools/llvm-symbolizer/CMakeLists.txt
13 ↗	(On Diff #392895)	I noticed that shortly after I committed this in 02cc8d69, Nico Weber kindly ported the gn build with 0f865dc6. I think this may be what's missing to fix the sanitizer buildbot and I've added the corresponding edit below. Next time I commit I'll watch this bot closely to see if it recurs. Thanks again for pointing out the failure!

noajshu marked 2 inline comments as done.Dec 9 2021, 1:34 PM

Harbormaster completed remote builds in B138518: Diff 393271.Dec 9 2021, 2:19 PM

In D113717#3183828, @noajshu wrote:

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

That buldbot does not use .gn

In D113717#3184078, @vitalybuka wrote:

In D113717#3183828, @noajshu wrote:

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

That buldbot does not use .gn

Ok! What does the buildbot do / use?
I followed the instructions to reproduce the build failure, but the build fails for other reasons. (logs)

In D113717#3184110, @noajshu wrote:

In D113717#3184078, @vitalybuka wrote:

In D113717#3183828, @noajshu wrote:

Add Debuginfod dep to llvm/utils/gn/secondary/llvm/lib/DebugInfo/Symbolize/BUILD.gn to attempt to fix buildbot failure

That buldbot does not use .gn

Ok! What does the buildbot do / use?
I followed the instructions to reproduce the build failure, but the build fails for other reasons. (logs)

Ah, I think I see the problem.

Add symbolizer symbols to global_symbols.txt.

noajshu edited the summary of this revision. (Show Details)Dec 9 2021, 4:21 PM

This revision was landed with ongoing or failed builds.Dec 9 2021, 4:25 PM

noajshu added a commit: rGe2ad4f175602: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer..

vitalybuka added inline comments.Dec 9 2021, 4:35 PM

compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗	(On Diff #393329)	we cannot do that all allowed symbols are essentially glibc or sanitizer itself but nothing will resolve these symbols

noajshu added inline comments.Dec 9 2021, 4:42 PM

compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗	(On Diff #393329)	Sorry, I'm just trying to understand what this does and I saw several commits added symbols to that list in the past. Can you give me some idea of what the general problem is? Do I need to update the buildbot configuration since I'm linking the symbolizer against a new library (Debuginfod)? Or did the sanitizer detect a bug in my code?

vitalybuka added inline comments.Dec 9 2021, 4:53 PM

compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗	(On Diff #393329)	I guess bot will fail on later step. global_symbols.txt it the list of unresolved symbols in this lib and we expect no llvm stuff there sanitized user apps may link this lib and without need for any llvm stuff sorry, that I didn't realized this before, but we just need to include Debuginfod into llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh

noajshu added inline comments.Dec 9 2021, 4:58 PM

compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗	(On Diff #393329)	Excellent, thank you so much! I will revert this commit, apply this change, remove my edits to the global_symbols.txt, and then re-push.

noajshu added a reverting change: rGafa3c14e2ff9: Revert "[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.".Dec 9 2021, 5:00 PM

vitalybuka added inline comments.Dec 9 2021, 5:01 PM

compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt
75 ↗	(On Diff #393329)	Up to you, but I don't mind if you fix this in folloup patch. Thank you!

Remove global_symbols.txt edits, and add Debuginfod lib to compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh

Harbormaster completed remote builds in B138566: Diff 393339.Dec 9 2021, 5:05 PM

noajshu edited the summary of this revision. (Show Details)Dec 9 2021, 5:11 PM

Herald added a subscriber: pengfei. · View Herald TranscriptDec 9 2021, 5:11 PM

to accept

This revision is now accepted and ready to land.Dec 9 2021, 5:19 PM

Thanks, compiler_rt LGTM.

This revision was landed with ongoing or failed builds.Dec 9 2021, 5:39 PM

Closed by commit rG5bba0fe12b29: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer. (authored by noajshu). · Explain Why

This revision was automatically updated to reflect the committed changes.

noajshu added a commit: rG5bba0fe12b29: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer..

Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2021, 5:39 PM

Herald added a subscriber: Restricted Project. · View Herald Transcript

vitalybuka mentioned this in rG2ff795a462f9: [sanitizer] Update symbols after D113717.Dec 9 2021, 9:51 PM

In D113717#3180177, @noajshu wrote:

@thakis Thanks for the suggestion. Although we may know the --objs which are going to be fed to llvm-symbolizer in advance, so we could run a "pull symbols" step just for those, we wouldn't know in advance which shared libraries the input (e.g. backtrace) will touch. So I think to make it work you'd need three steps -- symbolize and figure out what debuginfo is missing, then run the debuginfod fetch to download, and then symbolize again.

You could have a tool that pulls symbols for a binary and all the shared libraries it links, right?

A pattern I've seen over the years is that since we're all very busy, as soon as there's an easy explanation for why something is slow, people will jump to that explanation if something is slow. After this change, if someone says "llvm-symbolizer took several seconds", people will just say "well it probably pulled stuff off the network" instead of actually investigating.

If the concern of pulling stuff from the network is separated into its own binary, it's impossible to get confused about this.

In D113717#3180334, @noajshu wrote:

Here is the relevant llvm-dev thread where we planned these changes.

Thanks for the link! Looks like there wasn't a lot of discussion on the thread. 1 and 3 seem like great goals. As said on this review, I think it's better if for step 2 there was a dedicated binary to pull debug info from the web, and if llvm-symbolizer (and other llvm tools) didn't start making network requests, but instead just loaded the symbols downloaded by the new tool.

thakis mentioned this in rG2586c23bae04: [gn build] Prevent deps on HTTP requests in clang and lld at GN time.Dec 10 2021, 5:56 AM

Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:

//lld/tools/lld:lld has an assert_no_deps entry:
  //llvm/lib/Debuginfod:Debuginfod
which fails for the dependency path:
  //lld/tools/lld:lld ->
  //lld/COFF:COFF ->
  //llvm/lib/DebugInfo/Symbolize:Symbolize ->
  //llvm/lib/Debuginfod:Debuginfod

The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.

thakis added a reverting change: rG30f221bba005: Revert "[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.".Dec 10 2021, 7:33 AM

hans added a subscriber: hans.Dec 10 2021, 8:39 AM

In D113717#3185471, @thakis wrote:
Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:
//lld/tools/lld:lld has an assert_no_deps entry:
  //llvm/lib/Debuginfod:Debuginfod
which fails for the dependency path:
  //lld/tools/lld:lld ->
  //lld/COFF:COFF ->
  //llvm/lib/DebugInfo/Symbolize:Symbolize ->
  //llvm/lib/Debuginfod:Debuginfod
The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)

@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.

In D113717#3185263, @thakis wrote:

In D113717#3180177, @noajshu wrote:

@thakis Thanks for the suggestion. Although we may know the --objs which are going to be fed to llvm-symbolizer in advance, so we could run a "pull symbols" step just for those, we wouldn't know in advance which shared libraries the input (e.g. backtrace) will touch. So I think to make it work you'd need three steps -- symbolize and figure out what debuginfo is missing, then run the debuginfod fetch to download, and then symbolize again.

You could have a tool that pulls symbols for a binary and all the shared libraries it links, right?

yes, see D112759

A pattern I've seen over the years is that since we're all very busy, as soon as there's an easy explanation for why something is slow, people will jump to that explanation if something is slow. After this change, if someone says "llvm-symbolizer took several seconds", people will just say "well it probably pulled stuff off the network" instead of actually investigating.

If the concern of pulling stuff from the network is separated into its own binary, it's impossible to get confused about this.

Hi Nico, I just added you as a reviewer of D115500, which changes the build so that LLVM builds without libcurl by default, even if libcurl is present. This way there is no confusion.

In D113717#3186152, @dblaikie wrote:
In D113717#3185471, @thakis wrote:
Turns out this isn't even just theoretical: This _did_ add a dependency on libcurl to lld:
//lld/tools/lld:lld has an assert_no_deps entry:
  //llvm/lib/Debuginfod:Debuginfod
which fails for the dependency path:
  //lld/tools/lld:lld ->
  //lld/COFF:COFF ->
  //llvm/lib/DebugInfo/Symbolize:Symbolize ->
  //llvm/lib/Debuginfod:Debuginfod
The linker shouldn't do http requests, obviously. Let's revert this for now until at least the lld dep is figured out.
I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)

@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.

Thanks for all these comments and sorry for the late response! I will respond more thoroughly to the points above soon. To avoid confusion I wanted to quickly mention something.
I believe there is an error in the GN build configuration which causes lld's dependency on Debuginfod.
Specifically the file
llvm/utils/gn/secondary/lld/COFF/BUILD.gn
adds the dependency on DebugInfo/Symbolize, but this is not in
lld/COFF/CMakeLists.txt
Perhaps this is gn line is leftover from a previous dependency which was updated in cmake but never updated in gn?
Removing it seems to resolve the issue building lld with gn when both this diff and rG2586c23b are applied.

Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.

In D113717#3186295, @noajshu wrote:

Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.

If the gn change can be committed/reviewed separately (before recommitting this change) then it probably should be.

Harbormaster completed remote builds in B138719: Diff 393572.Dec 10 2021, 1:51 PM

In D113717#3186322, @dblaikie wrote:

In D113717#3186295, @noajshu wrote:

Fix lld's COFF lib gn script to remove unnecessary dependency on llvm/lib/DebugInfo/Symbolize. This should now be compatible with rG2586c23b.

If the gn change can be committed/reviewed separately (before recommitting this change) then it probably should be.

Sounds good! Here it is: D115554.

In D113717#3186152, @dblaikie wrote:

@noajshu Could you look into if/how lld uses the symbolizer and if there are use cases where it would have an positive effect on the user experience (improved diagnostic quality, etc) to lookup debug info on the symbol server? If there are no such cases, maybe having a separate entry point into the symbolizing library that statically does not use debuginfod would be good. The library dependency would still be there, but if a caller only uses the non-debuginfod entry point, nothing from the debuginfod library would be pulled in at link time.

Good idea, thanks for the recommendation!
I've taken a look and lld. There is no usage of the definitions in Symbolize.cpp. This is unsurprising as removing the lib dep in D115554 did not break the gn build.
There is one file, lld/COFF/SymbolTable.cpp that includes "llvm/DebugInfo/Symbolize.h". This is a red herring, lld not use any of the Symbolize.h declarations. It actually just needs llvm/DebugInfo/DIContext.h which is included as
Symbolize.h -> SymbolizableModule.h -> DIContext.h.
So probably that line should be replaced with #include "llvm/DebugInfo/DIContext.h". I'm happy to add this to D115554.

In D113717#3186152, @dblaikie wrote:

Though most of lld's symbolizing is on object files and I don't think it's expected that debug info for unlinked object files would be in a symbol server - but perhaps there are cases where lld might want to symbolize an address in a .so where symbolizing might still be useful (though arguably so - if it's system installed it's probably not super relevant to the user to know what source files the symbol came from - they're just dealing with the system library, they aren't the authors of it - and if it's something they authored it probably has local debug info that doesn't need looking up)

I agree, there could be some use cases for debuginfod in lld, though it's probably not as natural target as (say) lldb.
Now that it's clear that lld does not use Symbolize, and therefore lld does not use Debuginfod, I propose llvm-dev as more appropriate place to discuss which tools should (or should not) use debuginfod in the future.
IIRC this is the most recent reply to the debuginfod RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-October/153101.html
That would be a good place to follow up.
Thanks!

Remove lld gn build change, as it has been moved to D115554 at @dblaikie 's suggestion.

noajshu edited the summary of this revision. (Show Details)Dec 10 2021, 3:34 PM

Harbormaster completed remote builds in B138744: Diff 393606.Dec 10 2021, 4:16 PM

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

It's very scary to me that you say that. Maybe we should have a in-person (well, VC) discussion about this change? To me, separation of concerns makes it fairly obvious that compilers and linkers etc shouldn't rely on an internet connection to work. There should be a separate binary that handles downloading stuff that can be run well in advance (or, not run). Kitchen-sync binaries/libraries etc tend to be not great to work with.

D115554 looks fine to me, independent of this change. I don't think it helps this change here much since another tool might add an accidental (or even intentional) dependency on DebugInfo/Symbollize again, like lld did in the past.

D115500 also looks fine to me in isolation, but it doesn't help with the overall design here.

I fairly strongy feel that there should be a library for just reading this debuginfod cache that doesn't do web requests that all tools could link to without concerns, and a separate library for writing to it and doing network requests, and that should be linked to from very few binaries.

thakis mentioned this in D115500: [llvm] [Debuginfod] Disable CURL by default..Dec 13 2021, 8:21 AM

This revision was landed with ongoing or failed builds.Dec 13 2021, 3:04 PM

noajshu added a commit: rG34491ca7291c: [Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer..

In D113717#3189211, @thakis wrote:

I'm not 100% sure that it's obvious the linker shouldn't make http requests if it can be used to improve diagnostics.

It's very scary to me that you say that. Maybe we should have a in-person (well, VC) discussion about this change?

Happy to, if you like - my calendar's usually pretty open - either scheduled or impromptu is fine by me.

To me, separation of concerns makes it fairly obvious that compilers and linkers etc shouldn't rely on an internet connection to work. There should be a separate binary that handles downloading stuff that can be run well in advance (or, not run). Kitchen-sync binaries/libraries etc tend to be not great to work with.

D115554 looks fine to me, independent of this change. I don't think it helps this change here much since another tool might add an accidental (or even intentional) dependency on DebugInfo/Symbollize again, like lld did in the past.

D115500 also looks fine to me in isolation, but it doesn't help with the overall design here.

I fairly strongy feel that there should be a library for just reading this debuginfod cache that doesn't do web requests that all tools could link to without concerns, and a separate library for writing to it and doing network requests, and that should be linked to from very few binaries.

noajshu, I'm confused why this relanded with ongoing open discussion.

In D113717#3192444, @thakis wrote:

noajshu, I'm confused why this relanded with ongoing open discussion.

Hi @thakis , we achieved consensus on this approach with an RFC to llvm-dev. Since you're proposing to remove debuginfod could you please follow up with your concerns on llvm-dev?
Thanks!

In D113717#3192606, @noajshu wrote:

In D113717#3192444, @thakis wrote:

noajshu, I'm confused why this relanded with ongoing open discussion.

Hi @thakis , we achieved consensus on this approach with an RFC to llvm-dev. Since you're proposing to remove debuginfod could you please follow up with your concerns on llvm-dev?
Thanks!

The thread had very few replies on it, the actual technical discussion seems to happen on this review.

@dblaikie, @phosek, @noajshu, @hans and I met (online) last month and talked about this for a bit.

The summary was the following action items:

have Debuginfod have a callback and put http request in llvm-symbolizer, and make llvm-symbolizer instead of Debuginfod depend on libcurl (iirc @phosek offered to look at that)
maybe have an llvm-symbolizer mode that doesn't do requests
maybe have a way to explicitly request pre-downloads of debug info

Some notes about things mentioned during the meeting:

Concerts with the approach:

separation of concerns is generally good; compiler shouldn't depend on network
libcurl now makes it into the deps of llvm.so
- - some linux distros link it dynamically, so clang now depends on it transitively
- consider llvm-symbolizer copied via adb to a device that has funky connectivity (easy to see this causing timeouts);
bootstrapping
parallelism, no write blocking if pre-read
- would download in parallel atm

These concerns don't apply in all scenarios:

.o storage might networked anyways, so less a concern there
might have to pull stuff for dlopen()'d shared libraries
would be happy if libllvm.so didn't depend on libcurl
care about llvm-symbolizer and lldb
current setup: build system image on one machine, strip binaries, upload debug info to GCS, then runs tests based on image, and then at runtime downloads debug info on demand
- can't pull all debug info for all binaries, too much storage
been using this model for 3 years now, seems to work fine
- current thing not debuginfod based
is opt-in based on env vars, so is that really different from having your debug info on a fuse mount
(someone might add cmake var)

thakis mentioned this in D116784: [ProfileData] Read and symbolize raw memprof profiles..Feb 3 2022, 3:40 PM

snehasish mentioned this in rGdbf47d227d08: Revert "[ProfileData] Read and symbolize raw memprof profiles.".Feb 3 2022, 4:14 PM

snehasish added a subscriber: snehasish.Feb 3 2022, 4:36 PM

In D113717#3295350, @thakis wrote:

@dblaikie, @phosek, @noajshu, @hans and I met (online) last month and talked about this for a bit.

The summary was the following action items:

have Debuginfod have a callback and put http request in llvm-symbolizer, and make llvm-symbolizer instead of Debuginfod depend on libcurl (iirc @phosek offered to look at that)

maybe have an llvm-symbolizer mode that doesn't do requests

This is being implemented by @mysterymath as D118413 and D118665.

Great, thanks!

Glad to see others had similar concerns at https://github.com/llvm/llvm-project/issues/52731 :)

thakis mentioned this in D131224: [llvm-objdump] Find debug information with Build ID/debuginfod..Oct 3 2022, 4:47 PM

This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 392161

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/test/tools/llvm-symbolizer/debuginfod.test

This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 392161

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/test/tools/llvm-symbolizer/debuginfod.test

[Symbolizer][Debuginfo] Add debuginfod client to llvm-symbolizer.
ClosedPublic