Page MenuHomePhabricator

Add option to llvm-gsymutil to read addresses from stdin.
ClosedPublic

Authored by simon.giesecke on May 11 2021, 1:51 AM.

Details

Summary

llvm-symbolizer and llvm-addr2line allow to read addresses
from stdin, which makes them convenient to use in a context
where a large number of addresses should be resolved
(which may be too many to pass as command line arguments)
or not all addresses are known at the same time.

This patch adds a --addresses-from-stdin option to
llvm-gsymutil to allow the same.

Diff Detail

Event Timeline

simon.giesecke requested review of this revision.May 11 2021, 1:51 AM
simon.giesecke created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 1:51 AM

I know this isn't covered by tests right now. I am not sure what the policy is for the command line utilities. Currently, llvm-gsymutil doesn't have any tests IIUC. If tests should be added, please direct me to a good example, and I will be happy to add them.

Hm, clang-tidy complains about two issues in code I copied from llvm-symbolizer. Should these be addressed?

clayborg requested changes to this revision.May 11 2021, 2:37 PM

I know this isn't covered by tests right now. I am not sure what the policy is for the command line utilities. Currently, llvm-gsymutil doesn't have any tests IIUC. If tests should be added, please direct me to a good example, and I will be happy to add them.

yes tests should be added and there are indeed llvm-gsymutil tests. See this directory:

llvm/test/tools/llvm-gsymutil

Hm, clang-tidy complains about two issues in code I copied from llvm-symbolizer. Should these be addressed?

if it is new code, then yes.

One thing to note is that there is library code that tools can directly link to, just like llvm-gsymutil links against the "DebugInfoGSYM" library. Then your lookups are very simple and you would call a function just like doLookup() that you added. Is there a reason you are wanting to run a command line tool instead of linking against the code?

Also, I am not a fan of text scraping from the output of a tool unless it is purely for humans to read. If you are going to use this output from another tools, I would vote to add a new option: "--json" so that the output would be formatted using structured data like JSON. Something like:

$ llvm-gsymutil --json --addresses-from-stdin
0x1000 /tmp/a.gsym
0x2000 /tmp/b.gsym

And this would return output in a specific JSON format, something like:

[
  { "lookupAddress": 4096, "gsym": "/tmp/a.gsym", ... },
  { "lookupAddress": 8192, "gsym": "/tmp/b.gsym", ... }
]

The "..." above would be a JSON version of the llvm::gsym::LookupResult object, or an error message.

llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
117

You will need to specify what format the STDIN format needs to be. From reading to the code below it seems to be:

<addr> <gsym-path> [<addr> <gsym-path> ...]
490

There should be a test for this error

501

Can we use C++ STL here? Or there might be some other LLVM tools that use STDIN using some other LLVM input wrapping?

std::string line;
std::getline(std::cin, line);
`
505

What happens if you send the following input into this tool:

"0x1000 /tmp/a.gsym
0x2000 /tmp/b.gsym
"

If I read the code above correctly, it will end up with a string:

"0x1000 /tmp/a.gsym0x2000 /tmp/b.gsym"

because you are erasing all newline characters and the newline between "0x1000 /tmp/a.gsym" and "0x2000 /tmp/b.gsym" will be removed.

507–509

Does any other tool use this type of stdin format where you specify an address and a file? Unless any other existing tool does (like atos?) I would think that we would run the tool with a GSYM file and then do multiple lookups on that one GSYM file:

$ llvm-gsymutil /tmp/a.gsym --addresses-from-stdin
0x1000 0x2000 0x3000
This revision now requires changes to proceed.May 11 2021, 2:37 PM

Sounds like, after I reread the description, that we have other tools in the llvm ecosystem that use this <addr> <file> format... Sorry for the noise. I will add inline comments to clarify any needed changes.

So looks like we just need tests:

  • check error when user specifies --addresses-from-stdin and also an address
  • check error when user specifies --addresses-from-stdin and also an input file
  • check for successful lookups on multiple address + file tuples

And if would be good to specify the input format for the --addresses-from-stdin option in the option description.

llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
505

Never mind, we are grabbing one line at a time, so this wouldn't happen... Ignore above comment.

509

Ignore this, after I reread the description, it seems clear there are other tools using this same format.

On optimization idea is that is one input file is specified, we could specify only addresses in the STDIN? Something like:

$ llvm-gsymutil --addresses-from-stdin /tmp/a.gsym
0x1000 0x2000 0x3000

If this is desirable, we would need to implement and document this in the --addresses-from-stdin help text.

I know this isn't covered by tests right now. I am not sure what the policy is for the command line utilities. Currently, llvm-gsymutil doesn't have any tests IIUC. If tests should be added, please direct me to a good example, and I will be happy to add them.

yes tests should be added and there are indeed llvm-gsymutil tests. See this directory:

llvm/test/tools/llvm-gsymutil

Ah, sorry for missing that. It seems so obvious now that I don't know how I missed it.

Hm, clang-tidy complains about two issues in code I copied from llvm-symbolizer. Should these be addressed?

if it is new code, then yes.

One thing to note is that there is library code that tools can directly link to, just like llvm-gsymutil links against the "DebugInfoGSYM" library. Then your lookups are very simple and you would call a function just like doLookup() that you added. Is there a reason you are wanting to run a command line tool instead of linking against the code?

In a first step, we just wanted to switch from one command line tool (addr2line) to another. For some use cases, linking against the library would be an option, and probably indeed one we will take in a future step. In other use cases, this might be less feasible, notably pprof, which is written in Go, which also calls other command line utilities such as nm, addr2line or llvm-symbolizer.

Also, I am not a fan of text scraping from the output of a tool unless it is purely for humans to read. If you are going to use this output from another tools, I would vote to add a new option: "--json" so that the output would be formatted using structured data like JSON. Something like:

$ llvm-gsymutil --json --addresses-from-stdin
0x1000 /tmp/a.gsym
0x2000 /tmp/b.gsym

And this would return output in a specific JSON format, something like:

[
  { "lookupAddress": 4096, "gsym": "/tmp/a.gsym", ... },
  { "lookupAddress": 8192, "gsym": "/tmp/b.gsym", ... }
]

The "..." above would be a JSON version of the llvm::gsym::LookupResult object, or an error message.

I agree that having a JSON output would be better for further processing. I just wanted to keep changes minimal for now, and stick with the output format the tool has. Just after putting up this review, I noticed that https://reviews.llvm.org/D96883 added a --json option to llvm-symbolizer.

We should probably use the same format here.

That makes me wonder whether the direction this takes is actually the right one. Another option might be to add support for GSYM to llvm-symbolizer. Do you think this would be a better direction?

I know this isn't covered by tests right now. I am not sure what the policy is for the command line utilities. Currently, llvm-gsymutil doesn't have any tests IIUC. If tests should be added, please direct me to a good example, and I will be happy to add them.

yes tests should be added and there are indeed llvm-gsymutil tests. See this directory:

llvm/test/tools/llvm-gsymutil

Ah, sorry for missing that. It seems so obvious now that I don't know how I missed it.

Hm, clang-tidy complains about two issues in code I copied from llvm-symbolizer. Should these be addressed?

if it is new code, then yes.

One thing to note is that there is library code that tools can directly link to, just like llvm-gsymutil links against the "DebugInfoGSYM" library. Then your lookups are very simple and you would call a function just like doLookup() that you added. Is there a reason you are wanting to run a command line tool instead of linking against the code?

In a first step, we just wanted to switch from one command line tool (addr2line) to another. For some use cases, linking against the library would be an option, and probably indeed one we will take in a future step. In other use cases, this might be less feasible, notably pprof, which is written in Go, which also calls other command line utilities such as nm, addr2line or llvm-symbolizer.

Also, I am not a fan of text scraping from the output of a tool unless it is purely for humans to read. If you are going to use this output from another tools, I would vote to add a new option: "--json" so that the output would be formatted using structured data like JSON. Something like:

$ llvm-gsymutil --json --addresses-from-stdin
0x1000 /tmp/a.gsym
0x2000 /tmp/b.gsym

And this would return output in a specific JSON format, something like:

[
  { "lookupAddress": 4096, "gsym": "/tmp/a.gsym", ... },
  { "lookupAddress": 8192, "gsym": "/tmp/b.gsym", ... }
]

The "..." above would be a JSON version of the llvm::gsym::LookupResult object, or an error message.

I agree that having a JSON output would be better for further processing. I just wanted to keep changes minimal for now, and stick with the output format the tool has. Just after putting up this review, I noticed that https://reviews.llvm.org/D96883 added a --json option to llvm-symbolizer.

We should probably use the same format here.

That makes me wonder whether the direction this takes is actually the right one. Another option might be to add support for GSYM to llvm-symbolizer. Do you think this would be a better direction?

It would be great to add support for GSYM into llvm-symbolizer! It also should be very easy to add if users supply the symbol file to llvm-symbolizer as it would be easy to sniff the first few bytes of the files for magic bytes and routing to the correctly file format's parser.

That being said, since we have other tools that are already using this format, I am totally fine with this being in llvm-gsymutil.

Update using arc

simon.giesecke marked 7 inline comments as done.

Addressed review comments

llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
501

I copied this from llvm-symbolizer but I can surely change this to use STL. I can also use some other LLVM tools, but would need some guidance.

@clayborg I addressed your comments now. Adding support for --json is something I will look into separately, if that's fine for you.

Fix path handling in test case on Windows

clayborg accepted this revision.May 19 2021, 3:05 PM

Looks good to me. Thanks for the changes.

This revision is now accepted and ready to land.May 19 2021, 3:05 PM
This revision was landed with ongoing or failed builds.May 19 2021, 11:11 PM
This revision was automatically updated to reflect the committed changes.

That makes me wonder whether the direction this takes is actually the right one. Another option might be to add support for GSYM to llvm-symbolizer. Do you think this would be a better direction?

It would be great to add support for GSYM into llvm-symbolizer! It also should be very easy to add if users supply the symbol file to llvm-symbolizer as it would be easy to sniff the first few bytes of the files for magic bytes and routing to the correctly file format's parser.

I am working on this now :) I am trying to find my way around llvm-symbolizer, and I came across DILineInfo, which is documented as "A format-neutral container for source line information." Is there already a way to get a DILineInfo from GSYM files? I haven't found any direct way to do this, in GsymReader or so.

That makes me wonder whether the direction this takes is actually the right one. Another option might be to add support for GSYM to llvm-symbolizer. Do you think this would be a better direction?

It would be great to add support for GSYM into llvm-symbolizer! It also should be very easy to add if users supply the symbol file to llvm-symbolizer as it would be easy to sniff the first few bytes of the files for magic bytes and routing to the correctly file format's parser.

I am working on this now :) I am trying to find my way around llvm-symbolizer, and I came across DILineInfo, which is documented as "A format-neutral container for source line information." Is there already a way to get a DILineInfo from GSYM files? I haven't found any direct way to do this, in GsymReader or so.

Not yet! Feel free to add an accessor to convert any GSYM data structures into DI data structures. There are two main formats that come out of GSYM: the full llvm::gsym::FunctionInfo or the llvm::gsym::LookupResult. llvm::gsym::FunctionInfo is the complete information for the function with the name, address range, line table and inline info. This info covers all addresses in the function. llvm::gsym::LookupResult is just the information you need for a single address and in a much friendlier format where all strings have been retrieved from the string table. It should be very easy to make any needed conversions. Let me know if you have any questions.

That makes me wonder whether the direction this takes is actually the right one. Another option might be to add support for GSYM to llvm-symbolizer. Do you think this would be a better direction?

It would be great to add support for GSYM into llvm-symbolizer! It also should be very easy to add if users supply the symbol file to llvm-symbolizer as it would be easy to sniff the first few bytes of the files for magic bytes and routing to the correctly file format's parser.

I am working on this now :) I am trying to find my way around llvm-symbolizer, and I came across DILineInfo, which is documented as "A format-neutral container for source line information." Is there already a way to get a DILineInfo from GSYM files? I haven't found any direct way to do this, in GsymReader or so.

Not yet! Feel free to add an accessor to convert any GSYM data structures into DI data structures. There are two main formats that come out of GSYM: the full llvm::gsym::FunctionInfo or the llvm::gsym::LookupResult. llvm::gsym::FunctionInfo is the complete information for the function with the name, address range, line table and inline info. This info covers all addresses in the function. llvm::gsym::LookupResult is just the information you need for a single address and in a much friendlier format where all strings have been retrieved from the string table. It should be very easy to make any needed conversions. Let me know if you have any questions.

Ok, I just wanted to make sure I don't reinvent the wheel! Thanks for the directions, I'll get back to you if necessary :)