This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Print method name from debug info in disassembly output.
ClosedPublic

Authored by rupprecht on Feb 12 2020, 1:57 PM.

Details

Summary

GNU objdump prints the method name in disassembly output, and upon further investigation this seems to come from debug info, not the symbol table.

Some additional refactoring is necessary to make this work even when the line number is 0/the filename is unknown. The added test case includes a note for this scenario.

See http://llvm.org/PR41341 for more info.

Diff Detail

Event Timeline

rupprecht created this revision.Feb 12 2020, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 1:57 PM
dblaikie added inline comments.Feb 13 2020, 4:57 PM
llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
5

Clearly there's prior art here, but testing objdump with llc seems a bit heavyweight/circular (no doubt some llc tests use llvm-objdump to verify their result) - I'd expect this to use raw assembly + llvm-mc at most.

not a requirement for change, just a thought

llvm/tools/llvm-objdump/llvm-objdump.cpp
651

Is the intent to print the demangled name? The test doesn't verify demangling is happening (since it uses unmangled names to begin with) & it doesn't look like it is happening, given SymbolizerOpts.Demangle = false; on line 569

658

Might be worth flipping this and using an early return, so the rest of the function can be a little less indented.

rupprecht marked 4 inline comments as done.
  • Use early return
  • Add name mangling test coverage
rupprecht marked an inline comment as not done.Feb 14 2020, 10:44 AM
rupprecht added inline comments.
llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
5

Clearly there's prior art here, but testing objdump with llc seems a bit heavyweight/circular (no doubt some llc tests use llvm-objdump to verify their result)

Not entirely circular: it's only the llc tests that depend on llvm-objdump and llvm-objdump tests that depend on llc. The tools themselves don't have a circular dep.

I'd expect this to use raw assembly + llvm-mc at most.

The reason I went w/ IR is for the relatively straightforward debug annotations, e.g. run "clang++ -S -emit-llvm" on a toy cc file and manipulate it as needed. That doesn't seem to be feasible w/ raw assembly: the dwarf information gets encoded as raw bytes. I think this is a lot more readable, unless there's some other way to represent debug info in assembly?

llvm/tools/llvm-objdump/llvm-objdump.cpp
651

For GNU compatibility, it should actually print the mangled string. I wouldn't mind if we changed it & also suggested this change for GNU binutils though, it seems like an oversight that GNU objdump -ldC does not demangle here.

Anyway, added a test case to capture current behavior.

rupprecht marked an inline comment as done.Feb 14 2020, 12:18 PM
rupprecht added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
651

Proposed a patch to GNU objdump: https://sourceware.org/ml/binutils/2020-02/msg00369.html

I'll keep this patch consistent with the outcome of that (possibly post-commit).

  • Revert back to UseSymbolTable = true to allow printing the name in binaries without debug info
nhaehnle removed a subscriber: nhaehnle.Feb 16 2020, 3:45 AM

@rupprecht, could you post a before/after + GNU output example please, as I'm struggling purely from the tests to understand what change has actually been made.

llvm/tools/llvm-objdump/llvm-objdump.cpp
609

Where'd this blank line go?

rupprecht marked an inline comment as done.
  • Add back blank line accidentally removed
  • Add source of IR

Sure, here's an example you can play around with:

$ echo 'extern "C" int foo() { return 5; }; namespace xyz { int bar() { return 10; } }' > /tmp/src.cc; clang++ -g -c /tmp/src.cc -o /tmp/obj.o
$ objdump -ldC /tmp/obj.o  # GNU objdump
...
Disassembly of section .text:

0000000000000000 <foo>:
foo():
/tmp/src.cc:1
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 05 00 00 00          mov    $0x5,%eax
   9:   5d                      pop    %rbp
   a:   c3                      retq   
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000000010 <xyz::bar()>:
_ZN3xyz3barEv():  # <-- Note, see https://sourceware.org/ml/binutils/2020-02/msg00369.html to demangle this too
/tmp/src.cc:1
  10:   55                      push   %rbp
  11:   48 89 e5                mov    %rsp,%rbp
  14:   b8 0a 00 00 00          mov    $0xa,%eax
  19:   5d                      pop    %rbp
  1a:   c3                      retq   

$ llvm-objdump -ldC /tmp/obj.o  # llvm-objdump, w/o this patch
...
0000000000000000 foo:
; /tmp/src.cc:1
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: b8 05 00 00 00                movl    $5, %eax
       9: 5d                            popq    %rbp
       a: c3                            retq
       b: 0f 1f 44 00 00                nopl    (%rax,%rax)

0000000000000010 xyz::bar():
      10: 55                            pushq   %rbp
      11: 48 89 e5                      movq    %rsp, %rbp
      14: b8 0a 00 00 00                movl    $10, %eax
      19: 5d                            popq    %rbp
      1a: c3                            retq

$ llvm-objdump -ldC /tmp/obj.o  # llvm-objdump, w/ this patch
...
0000000000000000 foo:
; foo():  # <-- this is new
; /tmp/src.cc:1
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: b8 05 00 00 00                movl    $5, %eax
       9: 5d                            popq    %rbp
       a: c3                            retq
; /tmp/src.cc:1 # <-- just noticed this unexpected diff, I'll see what's going on here....
       b: 0f 1f 44 00 00                nopl    (%rax,%rax)

0000000000000010 xyz::bar():
; _ZN3xyz3barEv(): # <-- this is new
; /tmp/src.cc:1 # <-- this is new and correct (although wasn't the intention of this patch to fix)
      10: 55                            pushq   %rbp
      11: 48 89 e5                      movq    %rsp, %rbp
      14: b8 0a 00 00 00                movl    $10, %eax
      19: 5d                            popq    %rbp
      1a: c3                            retq

http://llvm.org/PR41341 also has some examples. It seems superfluous with the trivial examples, but helps in situations where the object file has debugging information but no (or limited) symbols.

Note this patch doesn't address the syntactic difference of the leading ; delimiter, which is still a difference between llvm-objdump and GNU objdump.

rupprecht updated this revision to Diff 245257.Feb 18 2020, 1:56 PM
  • Fix printing the same line info when only the function name changes (from something to <invalid>)
rupprecht updated this revision to Diff 245554.Feb 19 2020, 4:34 PM
  • Add demangling to match updated GNU objdump behavior
rupprecht marked an inline comment as done.Feb 19 2020, 4:35 PM
rupprecht added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
651

Final reply for this comment: the GNU objdump patch was accepted & has landed in trunk, so I updated the patch to also demangle here.

Running out of time today.. Will try to follow the logic tomorrow.

MaskRay added inline comments.Feb 20 2020, 9:41 PM
llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
5

I prefer an IR test for this case. The result is known to be stable and cannot affect people who work on CodeGen (test/CodeGen).

"frame-pointer"="none" can make the emitted instructions even simpler.

22

With "frame-pointer"="none", there will be even more NOPs. Consider CHECK-COUNT-10: nop or just omit NOPs.

llvm/tools/llvm-objdump/llvm-objdump.cpp
655

Please reformat.

rupprecht updated this revision to Diff 245997.Feb 21 2020, 2:35 PM
rupprecht marked 4 inline comments as done.
  • clang-format
  • Simplify test case
MaskRay accepted this revision.Feb 21 2020, 2:45 PM
MaskRay added inline comments.
llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
63

Add -O on the clang command line.

"frame-pointer"="none" is the default behavior of -O* on Linux x86-64. (Yeah, the frame pointer decision is OS/Arch dependent...A mess)

This revision is now accepted and ready to land.Feb 21 2020, 2:45 PM
This revision was automatically updated to reflect the committed changes.

FYI, I usually run:

arcfilter () {
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F -
}

to remove unneeded metadata tags from the git description before committing. Not sure about the most appropriate way. (I don't know enough about Phabricator... Maybe there is some arc magic doing this.)

Hopefully fixed by d17123b2577e610b2a19de1f530cecea353c8c7a

(I feel so sad about the canonical path separator on Windows... So many times a DWARF test accidentally failed on Windows. I asked whether we can use / everywhere but the answer was no.)