Page MenuHomePhabricator

llvm-addr2line: assume addresses on the command line are hexadecimal rather than attempting to guess the base based on the form of the number.
Needs ReviewPublic

Authored by rsmith on Jan 23 2020, 4:32 PM.



This matches the behavior of GNU addr2line. We previously treated
hexadecimal addresses as binary if they started with 0b, otherwise as
octal if they started with 0, otherwise as decimal.

This only affects llvm-addr2line; the behavior of llvm-symbolize is

Event Timeline

rsmith created this revision.Jan 23 2020, 4:32 PM
Herald added a project: Restricted Project. · View Herald Transcript

I wonder if it's worth adding a note the llvm-symbolizer user guide describing the format of the input address? At the moment, it doesn't mention what format the numbers must be in. Probably should be a separate change though, I guess.


I don't think referring to it as C literals is necessarily fair, since the style is more widely spread than just C. How about saying "instead of deriving their base from their prefix (if present)"?


I think this needs reflowing (looks like we're sticking to the 80-column width in this python script).


Could you add a leading '#' character here and for the other comments, to clearly delineate them from the lit and FileCheck directives, please. It makes it easier to read.


You need a test-case for '0X' prefixes as well as '0x'.


What gets printed instead here? I thought llvm-addr2line's behaviour was the same a llvm-symbolizer and that it prints the input value as-is if the value is not a valid number, so I'm surprised to see this check passes.

Does reading from /dev/null work on Windows? (I can try it out if you need me to).


I think you need to add checks for the "name not found pattern" (off the top of my head I want to say '??'), to show that this is actually a look-up and not simply echoing a rejection of a malformed input address.


I do not like the mixed variable naming styles in this function!

Given we use both upper and lower-case variable names, let's make this one follow the standard LLVM style (i.e. Offset).

LGTM after James' comments too. Thanks for spotting this inconsistency.


Tiny nit: my first time reading through this I read "assumes hex" as "guesses it's hex but falls back in other cases" (e.g. if the 0b prefix is provided).

Something like "requires hex" (like you have in the docs) might be more straightforward? Same with the comment in the code.


Does reading from /dev/null work on Windows? (I can try it out if you need me to).

Looks like lit handles that: