This is an archive of the discontinued LLVM Phabricator instance.

[symbolizer] Added StartAddress for the resolved function.
ClosedPublic

Authored by aorlov on May 12 2021, 1:16 AM.

Details

Summary

In many cases it is helpful to know at what address the resolved function starts.
This patch adds a new StartAddress member to the DILineInfo structure.

Diff Detail

Event Timeline

aorlov created this revision.May 12 2021, 1:16 AM
aorlov requested review of this revision.May 12 2021, 1:16 AM
aorlov updated this revision to Diff 344720.May 12 2021, 1:49 AM

This seems like a reasonable idea in principle to me, and the code as-is looks good for JSON output. I feel like it shouldn't be limited to just that output format however. I'm uncertain how to represent this in other output formats, but it may want to be through a new llvm-symbolizer option (e.g. --print-start-address), or maybe even it could just be part of the verbose output (since that is where the StartLine output is printed. What do you think?

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
1067–1070

No need for braces for single-line if statements.

it could just be part of the verbose output (since that is where the StartLine output is printed.

I like this more. Feels like we already have enough options.

What about compatibility with gnu adr2line? Unless I’m missing something, they do not offer such data. Do we care?

dblaikie added inline comments.May 13 2021, 1:16 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
1067–1070

Any thoughts on the behavior you want if a function has more than one code range (eg: PROPELLER or other uses of basic block sections - where the function will described with DW_AT_ranges instead of DW_AT_low_pc)?

it could just be part of the verbose output (since that is where the StartLine output is printed.

I like this more. Feels like we already have enough options.

What about compatibility with gnu adr2line? Unless I’m missing something, they do not offer such data. Do we care?

We definitely care about GNU compatibility for the GNU output style. There are two options here: 1) discuss on the GNU binutils mailing list whether they'd be interested in implementing the same feature, maybe with a concrete proposal of what the output would look like - if they accept it, model our output on theirs. 2) Add the output to the LLVM-style verbose output only (also to JSON output of course).

Don't forget to update the examples (and potentially text) in the docs to match whatever the new output is.

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
1067–1070

I've got no concrete preference here currently, but I can think of the following options:

  1. In such cases, don't print a StartAddress, to avoid confusion.
  2. Print the start of the current block in the range.
  3. Print the start of the first block (if possible), since that's technically the start address, and also the most likely entry point of the function.
  4. Print the details of all ranges related to this symbol.
dblaikie added inline comments.May 14 2021, 9:18 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
1067–1070

Yeah, I'm OK with any of that - though (3) isn't really possible with LLVM's DWARF output - the list of blocks isn't ordered, so you can't find the first one only from inspecting the address ranges. (yes, I'm pretty sure in LLVM's output the first block in the list of ranges is the start of the function, but nothing in DWARF communicates/guarantees that) - there's a separate attribute that can be emitted on subprograms that's for the entry point address which could be used for this (or you can lookup the symbol table - if the function/executable has a symbol table - and use that address)

Of course I though about multiple ranges case. And preferred keeping it simple for the beginning. This patch does not print StartAddress if DW_AT_low_pc is missing. That’s the case #1 in the above list of options. As already mentioned, in case of multiple ranges we do not know for sure where a function starts, at least not without an extra analyses of the code. We will need more solid use case and samples to move to that direction. Not a subject of this patch.

To summarize.

  1. I will add StartAddress to the verbose output for LLVM.
  2. If/when time permits I’ll follow up with gnu community with the proposal pointing to LLVM output as an example.

The patch is coming shortly.

By the way, llvm-symbolizer always uses symbol table. Opts.UseSymbolTable = true is hard coded and user has no way to control that.

aorlov updated this revision to Diff 345582.May 14 2021, 4:12 PM
aorlov marked an inline comment as done.

Of course I though about multiple ranges case. And preferred keeping it simple for the beginning. This patch does not print StartAddress if DW_AT_low_pc is missing. That’s the case #1 in the above list of options. As already mentioned, in case of multiple ranges we do not know for sure where a function starts, at least not without an extra analyses of the code. We will need more solid use case and samples to move to that direction. Not a subject of this patch.

To summarize.

  1. I will add StartAddress to the verbose output for LLVM.
  2. If/when time permits I’ll follow up with gnu community with the proposal pointing to LLVM output as an example.

The patch is coming shortly.

By the way, llvm-symbolizer always uses symbol table. Opts.UseSymbolTable = true is hard coded and user has no way to control that.

Fair enough.

(re: symbol table - I meant for figuring out the address - I think the "UseSymbolTable" is for the names, not the addresses (see the comments in SymbolizableObjectFile::shouldOverrideWithSymbolTable))

Reminder:

Don't forget to update the examples (and potentially text) in the docs to match whatever the new output is.

llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp
110

I feel like it would be a good idea to reduce code duplication of this function. I'd suggest pulling the common bits out into separate static or private functions, that can be called as necessary. Example:

void PlainPrinterBase::printVerbose(StringRef Filename,
                                    const DILineInfo &Info) {
  printFileAndFunctionStart(...);
  printLineColumnDiscriminator(...);
}

void LLVMPrinter::printVerbose(StringRef Filename, const DILineInfo &Info) {
  printFileAndFunctionStart(...);
  if (Info.StartAddress) {
    OS << "  Function start address: 0x";
    OS.write_hex(*Info.StartAddress);
    OS << '\n';
  }
  printLineColumnDiscriminator(...);
}

Alternatively, put the start address printing in another virtual function, and then just have the PlainPrinterBase::printVerbose function callt that function.

aorlov updated this revision to Diff 345902.May 17 2021, 8:47 AM
aorlov updated this revision to Diff 345974.May 17 2021, 12:46 PM
aorlov marked an inline comment as done.
aorlov updated this revision to Diff 345980.May 17 2021, 1:04 PM
jhenderson accepted this revision.May 18 2021, 12:41 AM

LGTM, thanks!

llvm/docs/CommandGuide/llvm-symbolizer.rst
377 ↗(On Diff #345980)

I don't think you need the word "start" here (we didn't before for the line part after all).

This revision is now accepted and ready to land.May 18 2021, 12:41 AM
This revision was automatically updated to reflect the committed changes.