Page MenuHomePhabricator

[llvm-objdump] Display locations of variables alongside disassembly
ClosedPublic

Authored by ostannard on Nov 26 2019, 8:44 AM.

Details

Summary

This adds the -debug-vars option to llvm-objdump, which prints locations (currently just registers, memory and more complex expressions will be added later) of source-level variables alongside the disassembly based on DWARF info. A vertical line is printed for each live-range, with a label at the top giving the variable name and location, and the position and length of the line indicating the program counter range in which it is valid.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
echristo added inline comments.Feb 19 2020, 11:49 AM
llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h
149

printCompact maybe?

llvm/include/llvm/Support/FormattedStream.h
110 ↗(On Diff #244371)

Comment what ComputePosition is doing here?

llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp
391

If you wouldn't mind commenting each of the cases with what's going on in them it'd be appreciated.

llvm/tools/llvm-objdump/llvm-objdump.cpp
612–614

Since this is going to get constructed with subsequent boolean arguments can we move them to enums/something? int, true, false, true isn't real obvious what it's constructing :)

729

Not a huge fan of boolean arguments. Is there a different factoring here that'll help?

jhenderson added inline comments.Feb 20 2020, 1:06 AM
llvm/docs/CommandGuide/llvm-objdump.rst
136–139

FWIW I'd have probably gone with --debug-vars=<encoding> ?

I like this idea. Saves the need for two switches, and makes things nice and explicit about what encoding to use.

Needs a rebase after D74507. --debug-vars seems to work with an ET_REL, but does it work with an ET_DYN/ET_DYN?

MaskRay added inline comments.Feb 23 2020, 5:04 PM
llvm/tools/llvm-objdump/llvm-objdump.cpp
578

//

Internal classes should not use Doxygen comments.

674

Ideally unsigned TabStop = NoShowRawInsn ? 16 : 40 in printInst should be extracted as a function.

Is --no-leading-addr taken into account?

721

addCompileUnit

Looks like this function can be merged with addFunction().

729

No need to use /// . These classes are internal.

855

Change OS << " " to print one space unconditionally can simplify code above.

880

Change OS << " " to OS << ' ' unconditionally can avoid " " above.

1880

Does this harm performance when --debug-vars is not used?

I once caused a regression (fixed by D64969).

ostannard marked 17 inline comments as done.Feb 26 2020, 10:49 AM

does it work with an ET_DYN/ET_DYN?

Not currently, I'll investigate and fix that as a separate patch.

llvm/tools/llvm-objdump/llvm-objdump.cpp
578

@aprantl previously asked for these to be changed to doxygen. I don't see anything explicit about this in the coding standard, but apparently IDEs can make use of these.

612–614

This is only ever constructed with these all set to false, so I can remove them from the constructor, and use the member names explicitly when modifying them.

674

--no-leading-addr isn't taken into account, because it doesn't affect the column the instruction starts at.

721

I don't see how, given that it calls addFunction in two places.

729

It's not easy to factor out, because CheckedVarIdxs is defined in the first half then used in the second. However, I can rename it to better describe what it does.

1880

If I've understood that discussion properly, the performance problem was caused by creating a formatted_raw_ostream for each instruction, because it flushes the underlying stream in the constructor. Here, i'm creating one per disassembled function, so the overhead will be much lower.

ostannard marked 5 inline comments as done.
  • Rebase
  • Style fixes
jhenderson added inline comments.Feb 27 2020, 2:20 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
352

I don't have a strong opinion on this, but perhaps DVDisabled etc would be more in keeping with LLVM style?

622–624

Could you avoid the constructor entirely and initialise the values inline?

ostannard updated this revision to Diff 246902.Feb 27 2020, 2:57 AM
ostannard marked 2 inline comments as done.
  • DV_Disabled -> DVDisabled
  • Remove unneeded constructor

Latest changes look okay to me, but please wait for @echristo and @MaskRay to be happy.

MaskRay added inline comments.Feb 28 2020, 1:25 PM
llvm/include/llvm/Support/FormattedStream.h
108 ↗(On Diff #246902)

The function name self documents. The comment may be unnecessary.

llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp
386

There is certainly prior art. But if changing all SmallString<20> to SmallString<16> may decrease the code size?

llvm/lib/Support/FormattedStream.cpp
29 ↗(On Diff #246902)

Ptr < End will be safer in -DLLVM_ENABLE_ASSERTIONS=off builds in case of a malformed UTF-8 code sequence.

33 ↗(On Diff #246902)

The following might be slight simpler.

auto C = static_cast<unsigned char>(*Ptr);
if (C < 0x80) MultiByte = false;
else if (C >= 0b11110000) Ptr += 3;
else if (C >= 0b11100000) Ptr += 2;
else Ptr++;
45 ↗(On Diff #246902)

This is incorrect due to East Asian Width.... http://www.unicode.org/reports/tr11/tr11-36.html

% cat a.cc
int main() {
  int 喵🙂 = 0;
  int 喵喵🙂 = 15;
  int 喵🙂喵🙂 = 24;
  喵🙂 = 28;
  喵喵🙂 = 32;
}
% clang -c a.cc
% llvm-objdump -S --no-show-raw-insn --debug-vars a.o

a.o:    file format elf64-x86-64


Disassembly of section .text:

0000000000000000 main:
; int main() {                                          ┠─ 喵🙂 = <unknown op DW_OP_fbreg (145)>
                                                         ┠─ 喵喵🙂 = <unknown op DW_OP_fbreg (145)> 
                                                          ┠─ 喵🙂喵🙂 = <unknown op DW_OP_fbreg (145)>
       0:       pushq   %rbp                                   
       1:       movq    %rsp, %rbp                             
       4:       xorl    %eax, %eax                             
;   int 喵🙂 = 0;                                              
       6:       movl    $0, -4(%rbp)                           
;   int 喵喵🙂 = 15;                                            
       d:       movl    $15, -8(%rbp)                          
;   int 喵🙂喵🙂 = 24;                                           
      14:       movl    $24, -12(%rbp)                         
;   喵🙂 = 28;                                                 
      1b:       movl    $28, -4(%rbp)                          
;   喵喵🙂 = 32;                                                
      22:       movl    $32, -8(%rbp)                          
; }                                                          
      29:       popq    %rbp                                   
      2a:       retq                                      
ostannard marked 6 inline comments as done.Mar 2 2020, 3:44 AM
ostannard added inline comments.
llvm/lib/Support/FormattedStream.cpp
45 ↗(On Diff #246902)

There's some code in the Support library which does this, which also simplifies this function a bit. It handles east asian characters correctly (at least the ones in your example), but doesn't get the correct widths for emojis. Clang seems to handle emojis by printing a hex value which it knows the length of:

$ /work/llvm/build/bin/clang -fsyntax-only -c width.c 
width.c:1:15: error: use of undeclared identifier 'a'
int 喵<U+1F642> = a;
                  ^
1 error generated.

Fixing that is going to be a lot of work, and isn't related to this patch, so I'll just make it work for characters supported by the existing unicode support library for now.

ostannard updated this revision to Diff 247588.Mar 2 2020, 3:46 AM
ostannard marked an inline comment as done.
  • Align columns correctly after east asian wide characters (though not emojis)
  • Other style changes
jhenderson added inline comments.Mar 4 2020, 1:30 AM
llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s
6–8

'##' for comments.

11

I think it will, but will this work on Windows (non-UTF8 environment)?

ostannard marked 3 inline comments as done.Mar 4 2020, 5:13 AM
ostannard added inline comments.
llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s
92

The SRC_COMPDIR sed script isn't used for the LINE-NUMS test (if it was we'd then need to regex out the whole path), so I've put this regex back.

llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s
11

It turns out that this doesn't work, at least in my windows 10 VM, so I'll disable this test for windows.

ostannard updated this revision to Diff 248152.Mar 4 2020, 5:14 AM
  • Disable wide-chars test on windows
  • Fix the line-numbers test, which was broken on windows (different dir separator)
jhenderson added inline comments.Mar 11 2020, 1:57 AM
llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s
9

Nit: windows -> Windows

It might be wise to change "han" -> "'喵'" as I don't know what the "han" character is.

13

Nit: east asian -> East Asian

(I think)

ostannard marked 2 inline comments as done.Mar 11 2020, 3:11 AM
ostannard added inline comments.
llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s
9

I was using "han character" meaning a Chinese, Japanese or Korean character, but it seems like "Chinese character" is a more common term, so I'll switch to that. I'd rather not use the character itself in a comment explaining that it might not render correctly on some systems!

ostannard updated this revision to Diff 249572.Mar 11 2020, 3:12 AM
jhenderson accepted this revision.Mar 13 2020, 3:06 AM

Latest changes LGTM, but please wait for others to be happy too.

This revision is now accepted and ready to land.Mar 13 2020, 3:06 AM
echristo accepted this revision.Mar 13 2020, 8:58 AM

Couple of random comments, but feel free to fix and commit. I think anything else can be handled post-review.

Thanks!

-eric

llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp
393

You assume MRI isn't null so probably just make it a reference.

443

Since MRI is going to be a reference downstream probably want to make it a reference here.

MaskRay added inline comments.Mar 13 2020, 9:18 AM
llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4-sections.s
14

After D75713, 00000000 <foo>:

You can now delete Disassembly of section .text.foo: and CHECK-EMPTY:

MaskRay accepted this revision.EditedMar 13 2020, 9:24 AM

In the git description, it may be worth mentioning that ET_EXEC/ET_DYN are not supported yet.

llvm/lib/Support/FormattedStream.cpp
42 ↗(On Diff #249572)

Delete excess braces.

ostannard marked 3 inline comments as done.Mar 16 2020, 3:39 AM
This revision was automatically updated to reflect the committed changes.

I forgot to restrict the new tests to only run when the relevant disassembler is built, should be fixed by rG161f70eae6cd and rG2878c6693875.

The tests are also failing on some windows bots with unicode errors, so I've disabled them on windows with rGf4cb9c919e28. These tests were passing locally in a Windows 10 VM, I'm not sure what is different about the buildbots.

I believe this breaks llvm/test/MC/ARM/lsl-zero.s on Windows.

The failure started here: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/14810 This build has many changes, but this is the only arm-related one as far as I can tell.

It's still happening on trunk: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/14816/steps/stage%201%20check/logs/stdio

Please take a look, and if it takes a while to investigate, please revert while you look.

My reply by email doesn't seem to have made it into phab, so repeating here (sorry for spamming everyone who did get the email):

I'm not sure how this patch could have caused that failure. This isn't really an arm-related patch, that's just the architecture I happened to write most of the tests using.

Do you have access to that buildbot to get the actual output from the llvm-mc command?

Reverted (together with a surprisingly long list of related changes) for now in 9e48422035813b79088dfa4637e31cd836332a54.

When you reland, please leave some time between commits, so that if one should break something, it's clearer which commit is the problem. Most bots cycle time-based, not commit-based, so things that land in quick succession tend to end up in the same build, which makes things confusing.

Sorry, I didn't see that email. The revert did get that bot green, so it was due to this change.

I don't have easy access to that bot, no.

(And sorry, I started the revert seconds before you must've replied, so I didn't see your reply until after I had committed.)

hoyFB added a subscriber: hoyFB.Mar 16 2020, 12:20 PM
hoyFB added inline comments.
llvm/include/llvm/Support/FormattedStream.h
108 ↗(On Diff #250510)

Hello, looks like this change breaks the clang test: clang/test/Analysis/checker-plugins.c by changing the printing format

from:

example.MyChecker:ExampleOption                   (bool) This is an example checker opt. (default: false)

to:

example.MyChecker:ExampleOption                   (bool) This is an example checker opt. (default:
                                                                             false)

and the test fails at

// CHECK-CHECKER-OPTION-HELP: example.MyChecker:ExampleOption  (bool) This is an
// CHECK-CHECKER-OPTION-HELP-SAME: example checker opt. (default: false)

Can you please take a look? Thanks!

benlangmuir added inline comments.
llvm/lib/Support/FormattedStream.cpp
38 ↗(On Diff #250510)

While it is probablye reasonable to expect only well-formed UTF-8 during the top-level call to write, this will fail for a buffered raw_ostream if you write a multi-byte sequence that happens to crosses the buffer end point. raw_ostream's buffering is byte-level, and won't prevent splitting a multibyte character. We saw this assertion failure in practice.

@thakis, thanks for reverting, sorry I wasn't more proactive about that yesterday.

@hoyFB, it looks like I've accidentally fixed a bug where the help text wasn't being wrapped when the stdout isn't a TTY. I don't see anything in the code to suggest that this is intended behaviour, so I'll update the test.

@benlangmuir, thanks for the pointer, it looks like we'll need to save the first few bytes of truncated UTF-8 sequences when this happens so that we can calculate the display width once we have the rest of the bytes.

I'll split the formatted_raw_ostream changes out into a separate patch, and add some unit tests for these edge cases. I've still got no idea what the lsl-zero.s failure is about, but splitting the patch up should at least narrow it down.

See D76291 for that split-put patch.

MaskRay reopened this revision.EditedJul 8 2020, 3:35 PM

I think this is still in a reverted state (9e48422035813b79088dfa4637e31cd836332a54 ). --debug-vars is useful and I'd like to see it come back. Adding @dblaikie for some opinions.

This revision is now accepted and ready to land.Jul 8 2020, 3:35 PM

This was blocked on D76291 for a while, but that is now committed, so I'll re-land this today.

This revision was automatically updated to reflect the committed changes.