Page MenuHomePhabricator

llvm-addr2line: assume addresses on the command line are hexadecimal rather than attempting to guess the base based on the form of the number.
ClosedPublic

Authored by rsmith on Jan 23 2020, 4:32 PM.

Details

Summary

This matches the behavior of GNU addr2line. We previously treated
hexadecimal addresses as binary if they started with 0b, otherwise as
octal if they started with 0, otherwise as decimal.

This only affects llvm-addr2line; the behavior of llvm-symbolize is
unaffected.

Diff Detail

Event Timeline

rsmith created this revision.Jan 23 2020, 4:32 PM
Herald added a project: Restricted Project. · View Herald Transcript

I wonder if it's worth adding a note the llvm-symbolizer user guide describing the format of the input address? At the moment, it doesn't mention what format the numbers must be in. Probably should be a separate change though, I guess.

llvm/docs/CommandGuide/llvm-addr2line.rst
21

I don't think referring to it as C literals is necessarily fair, since the style is more widely spread than just C. How about saying "instead of deriving their base from their prefix (if present)"?

llvm/test/lit.cfg.py
148

I think this needs reflowing (looks like we're sticking to the 80-column width in this python script).

llvm/test/tools/llvm-symbolizer/input-base.test
2

Could you add a leading '#' character here and for the other comments, to clearly delineate them from the lit and FileCheck directives, please. It makes it easier to read.

10

You need a test-case for '0X' prefixes as well as '0x'.

14

What gets printed instead here? I thought llvm-addr2line's behaviour was the same a llvm-symbolizer and that it prints the input value as-is if the value is not a valid number, so I'm surprised to see this check passes.

Does reading from /dev/null work on Windows? (I can try it out if you need me to).

16–17

I think you need to add checks for the "name not found pattern" (off the top of my head I want to say '??'), to show that this is actually a look-up and not simply echoing a rejection of a malformed input address.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
219–226

I do not like the mixed variable naming styles in this function!

Given we use both upper and lower-case variable names, let's make this one follow the standard LLVM style (i.e. Offset).

LGTM after James' comments too. Thanks for spotting this inconsistency.

llvm/test/tools/llvm-symbolizer/input-base.test
9

Tiny nit: my first time reading through this I read "assumes hex" as "guesses it's hex but falls back in other cases" (e.g. if the 0b prefix is provided).

Something like "requires hex" (like you have in the docs) might be more straightforward? Same with the comment in the code.

14

Does reading from /dev/null work on Windows? (I can try it out if you need me to).

Looks like lit handles that: https://github.com/llvm/llvm-project/blob/master/llvm/utils/lit/lit/TestRunner.py#L37

rsmith updated this revision to Diff 253775.Mar 30 2020, 8:20 PM
rsmith marked 11 inline comments as done.
  • Switch this function to the LLVM variable naming convention, to match the rest of the file.
rsmith marked 2 inline comments as done.Mar 30 2020, 8:21 PM
rsmith added inline comments.
llvm/docs/CommandGuide/llvm-addr2line.rst
21

The major difference I was trying to get across is that unprefixed numbers default to decimal in llvm-symbolizer (or octal if there's a 0 prefix) but to hexadecimal in llvm-addr2line, and I don't think your alternative really captures that. I've taken another pass over this. (I also changed the bullets to include a noun, since it wasn't really clear whether these bullets were talking about the behavior of addr2line or symbolizer.)

llvm/test/tools/llvm-symbolizer/input-base.test
10

Added both cases for all the tests.

16–17

Note the "x". That wasn't there in the input address.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
219–226

Every local variable in this function uses lower_camel_case, so I followed that. I'll just change them all to follow LLVM convention (as a separate commit).

jhenderson added inline comments.Mar 31 2020, 12:47 AM
llvm/docs/CommandGuide/llvm-addr2line.rst
28–38

The rewording changes look good to me, but please put them in a separate commit (feel free to push that bit with no further review).

llvm/test/tools/llvm-symbolizer/input-base.test
20

This line working is surprising to me. FileCheck is supposed to be case sensitive, so it looks to me like llvm-addr2line is consuming 0O1234 as an actual number or something and not treating it as an invalid address.

22

Some of my comments have moved around too much in Phabricator's view to easily see the context, so apologies if this feels like a bit of repetition. This CHECK on its own is insufficient for the cases where the input is the string 0x1234. In those cases, this CHECK will not distinguish between llvm-addr2line recognising the string as a valid hexadecimal address (and thus being able to use it to do the look up) and not (and thus just echoing it to the output). I think the only ways to tell are to either turn off -a (which would cause valid addresses to not appear and invalid values to be printed), or to check for the not-found pattern (to prove that a lookup was performed). I'd recommend the latter.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
219–226

Could you do the variable renaming in a separate pre-requisite patch, please, and then rebase this patch on top of that? That'll make it easier to spot the real differences.

rsmith updated this revision to Diff 253973.Mar 31 2020, 12:47 PM
rsmith marked 6 inline comments as done.
  • Explicitly check for failed lookup in tests to differentiate between success and errors.
  • Fix test expectations for 0O prefix, which is not supported by llvm-symbolizer nor llvm-addr2line.

I think all the changes are good, but I'd like to see this patch independently of the variable renamings and document rewording before giving final approval to be sure. Please could you create separate commits for those and then rebase this patch on top, showing the diff of this patch only then.

rsmith updated this revision to Diff 255843.Apr 7 2020, 3:47 PM
rsmith marked 2 inline comments as done.

Remove separately-committed cleanup commits.

rsmith added inline comments.Apr 7 2020, 3:58 PM
llvm/docs/CommandGuide/llvm-addr2line.rst
28–38

They are already in a separate commit; sorry for not mentioning that. (FYI, you can see the commits in the "Commits" tab in phabricator.) I went ahead and committed that.

llvm/test/tools/llvm-symbolizer/input-base.test
20

Nice catch =) Sorry, I made this change at the last minute before uploading and forgot to rerun the test, which does indeed fail as you expected. And it turns out that llvm-symbolizer doesn't accept the 0O (zero, capital o) prefix either. Test updated.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
219–226

This was already split out into a separate commit, but... it doesn't look like Phabricator lets you look at the contents of the individual commits in a review. That seems to make the ability to upload a series of commits to Phabricator as a single review... kind of useless. I've gone ahead and submitted the renaming since I think it's obvious and uncontroversial.

This was already split out into a separate commit, but... it doesn't look like Phabricator lets you look at the contents of the individual commits in a review. That seems to make the ability to upload a series of commits to Phabricator as a single review... kind of useless. I've gone ahead and submitted the renaming since I think it's obvious and uncontroversial.

So I think the design for Phabricator is intended to be one review per commit. I vaguely recall someone saying this was also the preferred style for reviews in LLVM in general, in one of the Github PRs versus Phabricator threads. Certainly, I find it easiest to work in that way. I don't know if arcanist has any way of linking things like this (I don't use it), but if you put "Depends on DXXXX" in your patch description, it will create a commit "Stack" tab which shows the commit thread (see D77308 for an example of the Stack tab). You can also do it manually using the "Edit Related Revisions" UI option.

rsmith added a comment.Apr 8 2020, 2:24 PM

So I think the design for Phabricator is intended to be one review per commit. I vaguely recall someone saying this was also the preferred style for reviews in LLVM in general, in one of the Github PRs versus Phabricator threads. Certainly, I find it easiest to work in that way. I don't know if arcanist has any way of linking things like this (I don't use it), but if you put "Depends on DXXXX" in your patch description, it will create a commit "Stack" tab which shows the commit thread (see D77308 for an example of the Stack tab). You can also do it manually using the "Edit Related Revisions" UI option.

I found an article describing the workflow I want, and some folks at Mozilla are working on automating it. But yeah, it seems like doing the phabricator review management yourself manually is the only way at the moment :-( Ah well, and thanks for the information. In any case, I think this is the patch you wanted to review, with the other bits already committed.

jhenderson accepted this revision.Apr 9 2020, 12:24 AM

LGTM. Sorry, should have posted that on the last comment!

This revision is now accepted and ready to land.Apr 9 2020, 12:24 AM
This revision was automatically updated to reflect the committed changes.