This is an archive of the discontinued LLVM Phabricator instance.

[llvm-symbolizer] Add options to disable printing source files & inlining
Needs ReviewPublic

Authored by danzimm on Jul 6 2020, 3:00 PM.

Details

Summary

Currently there is no way to disable source file output from llvm-symbolizer. Similarly there's no way to disable symbolicating inlined functions with llvm-symbolzer (this option is automatically disabled when llvm-addr2line is invoked instead of llvm-symbolizer).

This diff introduces flags to further customize llvm-symbolizer's output to support these two usecases:

  • --no-inlining: This disables symbolicating inlined frames. When paired with --output-style=LLVM and --use-symbol-table this results in a list of functions that appear in the final binary
  • --no-source-file: This disables printing source file information when symbolicating an address
  • --source-file: This enables printing source file information (the default). This option was added to balance --no-source-file.

The last of --no-source-file and --source-file passed will determine whether source file information is printed or not. The behavior of llvm-symbolizer before this diff should be identical to the behavior after this diff when none of the new options are specified.

Together --functions=linkage --demangle --output-style=LLVM --no-source-file --no-inlining results in a list of symbol names which appear in the resulting binary. This is useful when symbolicating a list of addresses e.g. for link order files.

N.B. The same data can be extracted with a processor on top of --functions=linkage --demangle --output-style=LLVM, however with large lists of symbols I've found that this takes quite a long time (my processor(s) were in perl/python- in theory I could've written a C/++ one, but I figure best just add these as formatting options to llvm-symbolizer instead).

Diff Detail

Event Timeline

danzimm created this revision.Jul 6 2020, 3:00 PM
Herald added a project: Restricted Project. · View Herald Transcript

If there are tests somewhere that I can add to, please point me to them! I'd love to add some tests, just couldn't quite find any (I'm guessing I'm just not looking in the right place... 😅)

Hi @danzimm,

In LLVM tools using the cl::opt interface, like llvm-symbolizer and llvm-addr2line, where you see a cl::opt<bool> (such as --inlining) you should be able to do --inlininng=0 to disable it. I'm not necessarily opposed to --no-inlining too mind you, but wanted to raise that before going further. If you wish to go ahead with adding --no-inlining, I'd recommend moving it to a separate patch, as it is independent of the --source-files stuff. Also, please add any new options to the llvm-symbolizer and llvm-addr2line docs (located in llvm/docs/CommandGuide).

If there are tests somewhere that I can add to, please point me to them! I'd love to add some tests, just couldn't quite find any (I'm guessing I'm just not looking in the right place... 😅)

Most tests for llvm-symbolizer (and llvm-addr2line) are located in llvm/test/tools/llvm-symbolizer (there are a few in other scattered locations - grep for llvm-symbolizer). Any new front-end options like these should be tested there.

Together --functions=linkage --demangle --output-style=LLVM --no-source-file --no-inlining results in a list of symbol names which appear in the resulting binary. This is useful when symbolicating a list of addresses e.g. for link order files.

N.B. The same data can be extracted with a processor on top of --functions=linkage --demangle --output-style=LLVM, however with large lists of symbols I've found that this takes quite a long time (my processor(s) were in perl/python- in theory I could've written a C/++ one, but I figure best just add these as formatting options to llvm-symbolizer instead).

Are you specifically interested in symbols at specific addresses, or with a specific type? llvm-nm and llvm-readobj can both be used to dump symbols too. It doesn't feel to me like llvm-symbolizer is the right tool for the job if you want to dump all symbols (or all functions), though I could possibly see an argument if you are limiting it to the symbols with specific addresses. I personally would think it would make more sense to add any necessary options to llvm-nm or possibly llvm-readelf. Adding a few others with binutils knowledge for more visibility and to get their input.

Wow, thanks for the quick and thoughtful response @jhenderson ! This was awesome to wake up to!

In LLVM tools using the cl::opt interface, like llvm-symbolizer and llvm-addr2line, where you see a cl::opt<bool> (such as --inlining) you should be able to do --inlininng=0 to disable it. I'm not necessarily opposed to --no-inlining too mind you, but wanted to raise that before going further

Whoa, I didn't know that! This is pretty cool! I'll go ahead and pull this change out of this diff. Do you have preference on introducing --no-inlining? I think I'll drop it altogether (as opposed to putting it in another diff) in order to try and reduce complexity.

Most tests for llvm-symbolizer (and llvm-addr2line) are located in llvm/test/tools/llvm-symbolizer (there are a few in other scattered locations - grep for llvm-symbolizer). Any new front-end options like these should be tested there.

Doh! It was right in front of me the whole time! Thanks for pointing that out. Next time (if there is a next time for this diff) I upload changes I'll add some tests!

Are you specifically interested in symbols at specific addresses, or with a specific type? llvm-nm and llvm-readobj can both be used to dump symbols too. It doesn't feel to me like llvm-symbolizer is the right tool for the job if you want to dump all symbols (or all functions), though I could possibly see an argument if you are limiting it to the symbols with specific addresses. I personally would think it would make more sense to add any necessary options to llvm-nm or possibly llvm-readelf. Adding a few others with binutils knowledge for more visibility and to get their input.

Originally I was interested in mapping a list of addresses to the names of the addresses of functions that appear in the binary from which the addresses came (these addresses are coming from instrumentation, e.g. -finstrument-function-entry-bare). After a bit of thought (and trial & error) I think I've concluded I actually do want source file information... it seems symbol names are duplicated across compilation units more often than I had originally expected.

So.... I think for my personal use cases we can go ahead and scrap this diff. I'm also ok with creating the tests and pushing through the --no-source-file change (it still probably has some utility in the case when there aren't duplicate symbols). What do you say @jhenderson ? Does scrapping this diff sound ok?

I'm personally fine with dropping this if it's not actually useful for you, as I don't have any use-case for it at the current time. Re. --no-inlining, I have a slight preference for not adding it, but I'm also okay with it being added, if you'd find it less confusing. I didn't know about the functionality of =0 to disable a flag in LLVM tools when I first came to the project myself, so it could be a little confusing. I actually added --no-demangle precisely for that reason.

I'm personally fine with dropping this if it's not actually useful for you, as I don't have any use-case for it at the current time. Re. --no-inlining, I have a slight preference for not adding it, but I'm also okay with it being added, if you'd find it less confusing. I didn't know about the functionality of =0 to disable a flag in LLVM tools when I first came to the project myself, so it could be a little confusing. I actually added --no-demangle precisely for that reason.

I have a slightly stronger opinion that we should not add it. We could improve help messages for cl::opt<bool> to mention the default value.

Originally I was interested in mapping a list of addresses to the names of the addresses of functions that appear in the binary from which the addresses came (these addresses are coming from instrumentation, e.g. -finstrument-function-entry-bare). After a bit of thought (and trial & error) I think I've concluded I actually do want source file information... it seems symbol names are duplicated across compilation units more often than I had originally expected.

Do you still have a need for output without source info? We add options if there is a reasonable use case, not that "we add it just to customize a behavior".

I'm personally fine with dropping this if it's not actually useful for you, as I don't have any use-case for it at the current time. Re. --no-inlining, I have a slight preference for not adding it, but I'm also okay with it being added, if you'd find it less confusing. I didn't know about the functionality of =0 to disable a flag in LLVM tools when I first came to the project myself, so it could be a little confusing. I actually added --no-demangle precisely for that reason.

I have a slightly stronger opinion that we should not add it. We could improve help messages for cl::opt<bool> to mention the default value.

Given it more thought, perhaps we should switch llvm-symbolizer to llvm-objcopy style OptTable. Many llvm::cl::opt based tools are not user facing (llc/opt). llvm::cl::opt is quick and easy. For user facing utilities (clang/lld/objcopy), OptTable may be more suitable as OptTable can be customized to be similar to the most common GNU-style getopt_long behavior. llvm-readobj/llvm-objdump are a bit special: they don't have defaulted-to-true llvm::cl::opt<bool>. If they do, we may face similar conumdrum like --no-demangle.

Originally I was interested in mapping a list of addresses to the names of the addresses of functions that appear in the binary from which the addresses came (these addresses are coming from instrumentation, e.g. -finstrument-function-entry-bare). After a bit of thought (and trial & error) I think I've concluded I actually do want source file information... it seems symbol names are duplicated across compilation units more often than I had originally expected.

Do you still have a need for output without source info? We add options if there is a reasonable use case, not that "we add it just to customize a behavior".

Created D83530 to switch to OptTable. The only unimplemented feature is grouped short options (POSIX.1-2017 12.2 Utility Syntax Guidelines, Guideline 5)

hiraditya resigned from this revision.Jul 28 2022, 8:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 8:31 PM