Page MenuHomePhabricator

[llvm-objdump] Display locations of variables alongside disassembly

Authored by ostannard on Nov 26 2019, 8:44 AM.



This adds the -debug-vars option to llvm-objdump, which prints locations (currently just registers, memory and more complex expressions will be added later) of source-level variables alongside the disassembly based on DWARF info. A vertical line is printed for each live-range, with a label at the top giving the variable name and location, and the position and length of the line indicating the program counter range in which it is valid.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
echristo added inline comments.Feb 19 2020, 11:49 AM

printCompact maybe?

110 ↗(On Diff #244371)

Comment what ComputePosition is doing here?


If you wouldn't mind commenting each of the cases with what's going on in them it'd be appreciated.


Since this is going to get constructed with subsequent boolean arguments can we move them to enums/something? int, true, false, true isn't real obvious what it's constructing :)


Not a huge fan of boolean arguments. Is there a different factoring here that'll help?

jhenderson added inline comments.Feb 20 2020, 1:06 AM

FWIW I'd have probably gone with --debug-vars=<encoding> ?

I like this idea. Saves the need for two switches, and makes things nice and explicit about what encoding to use.

Needs a rebase after D74507. --debug-vars seems to work with an ET_REL, but does it work with an ET_DYN/ET_DYN?

MaskRay added inline comments.Feb 23 2020, 5:04 PM


Internal classes should not use Doxygen comments.


Ideally unsigned TabStop = NoShowRawInsn ? 16 : 40 in printInst should be extracted as a function.

Is --no-leading-addr taken into account?



Looks like this function can be merged with addFunction().


No need to use /// . These classes are internal.


Change OS << " " to print one space unconditionally can simplify code above.


Change OS << " " to OS << ' ' unconditionally can avoid " " above.


Does this harm performance when --debug-vars is not used?

I once caused a regression (fixed by D64969).

ostannard marked 17 inline comments as done.Feb 26 2020, 10:49 AM

does it work with an ET_DYN/ET_DYN?

Not currently, I'll investigate and fix that as a separate patch.


@aprantl previously asked for these to be changed to doxygen. I don't see anything explicit about this in the coding standard, but apparently IDEs can make use of these.


This is only ever constructed with these all set to false, so I can remove them from the constructor, and use the member names explicitly when modifying them.


--no-leading-addr isn't taken into account, because it doesn't affect the column the instruction starts at.


I don't see how, given that it calls addFunction in two places.


It's not easy to factor out, because CheckedVarIdxs is defined in the first half then used in the second. However, I can rename it to better describe what it does.


If I've understood that discussion properly, the performance problem was caused by creating a formatted_raw_ostream for each instruction, because it flushes the underlying stream in the constructor. Here, i'm creating one per disassembled function, so the overhead will be much lower.

ostannard marked 5 inline comments as done.
  • Rebase
  • Style fixes
jhenderson added inline comments.Feb 27 2020, 2:20 AM

I don't have a strong opinion on this, but perhaps DVDisabled etc would be more in keeping with LLVM style?


Could you avoid the constructor entirely and initialise the values inline?

ostannard updated this revision to Diff 246902.Feb 27 2020, 2:57 AM
ostannard marked 2 inline comments as done.
  • DV_Disabled -> DVDisabled
  • Remove unneeded constructor

Latest changes look okay to me, but please wait for @echristo and @MaskRay to be happy.

MaskRay added inline comments.Feb 28 2020, 1:25 PM
108 ↗(On Diff #246902)

The function name self documents. The comment may be unnecessary.


There is certainly prior art. But if changing all SmallString<20> to SmallString<16> may decrease the code size?

29 ↗(On Diff #246902)

Ptr < End will be safer in -DLLVM_ENABLE_ASSERTIONS=off builds in case of a malformed UTF-8 code sequence.

33 ↗(On Diff #246902)

The following might be slight simpler.

auto C = static_cast<unsigned char>(*Ptr);
if (C < 0x80) MultiByte = false;
else if (C >= 0b11110000) Ptr += 3;
else if (C >= 0b11100000) Ptr += 2;
else Ptr++;
45 ↗(On Diff #246902)

This is incorrect due to East Asian Width....

% cat
int main() {
  int 喵🙂 = 0;
  int 喵喵🙂 = 15;
  int 喵🙂喵🙂 = 24;
  喵🙂 = 28;
  喵喵🙂 = 32;
% clang -c
% llvm-objdump -S --no-show-raw-insn --debug-vars a.o

a.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 main:
; int main() {                                          ┠─ 喵🙂 = <unknown op DW_OP_fbreg (145)>
                                                         ┠─ 喵喵🙂 = <unknown op DW_OP_fbreg (145)> 
                                                          ┠─ 喵🙂喵🙂 = <unknown op DW_OP_fbreg (145)>
       0:       pushq   %rbp                                   
       1:       movq    %rsp, %rbp                             
       4:       xorl    %eax, %eax                             
;   int 喵🙂 = 0;                                              
       6:       movl    $0, -4(%rbp)                           
;   int 喵喵🙂 = 15;                                            
       d:       movl    $15, -8(%rbp)                          
;   int 喵🙂喵🙂 = 24;                                           
      14:       movl    $24, -12(%rbp)                         
;   喵🙂 = 28;                                                 
      1b:       movl    $28, -4(%rbp)                          
;   喵喵🙂 = 32;                                                
      22:       movl    $32, -8(%rbp)                          
; }                                                          
      29:       popq    %rbp                                   
      2a:       retq                                      
ostannard marked 6 inline comments as done.Mar 2 2020, 3:44 AM
ostannard added inline comments.
45 ↗(On Diff #246902)

There's some code in the Support library which does this, which also simplifies this function a bit. It handles east asian characters correctly (at least the ones in your example), but doesn't get the correct widths for emojis. Clang seems to handle emojis by printing a hex value which it knows the length of:

$ /work/llvm/build/bin/clang -fsyntax-only -c width.c 
width.c:1:15: error: use of undeclared identifier 'a'
int 喵<U+1F642> = a;
1 error generated.

Fixing that is going to be a lot of work, and isn't related to this patch, so I'll just make it work for characters supported by the existing unicode support library for now.

ostannard updated this revision to Diff 247588.Mar 2 2020, 3:46 AM
ostannard marked an inline comment as done.
  • Align columns correctly after east asian wide characters (though not emojis)
  • Other style changes
jhenderson added inline comments.Mar 4 2020, 1:30 AM

'##' for comments.


I think it will, but will this work on Windows (non-UTF8 environment)?

ostannard marked 3 inline comments as done.Mar 4 2020, 5:13 AM
ostannard added inline comments.

The SRC_COMPDIR sed script isn't used for the LINE-NUMS test (if it was we'd then need to regex out the whole path), so I've put this regex back.


It turns out that this doesn't work, at least in my windows 10 VM, so I'll disable this test for windows.

ostannard updated this revision to Diff 248152.Mar 4 2020, 5:14 AM
  • Disable wide-chars test on windows
  • Fix the line-numbers test, which was broken on windows (different dir separator)
jhenderson added inline comments.Mar 11 2020, 1:57 AM

Nit: windows -> Windows

It might be wise to change "han" -> "'喵'" as I don't know what the "han" character is.


Nit: east asian -> East Asian

(I think)

ostannard marked 2 inline comments as done.Mar 11 2020, 3:11 AM
ostannard added inline comments.

I was using "han character" meaning a Chinese, Japanese or Korean character, but it seems like "Chinese character" is a more common term, so I'll switch to that. I'd rather not use the character itself in a comment explaining that it might not render correctly on some systems!

ostannard updated this revision to Diff 249572.Mar 11 2020, 3:12 AM
jhenderson accepted this revision.Mar 13 2020, 3:06 AM

Latest changes LGTM, but please wait for others to be happy too.

This revision is now accepted and ready to land.Mar 13 2020, 3:06 AM
echristo accepted this revision.Mar 13 2020, 8:58 AM

Couple of random comments, but feel free to fix and commit. I think anything else can be handled post-review.




You assume MRI isn't null so probably just make it a reference.


Since MRI is going to be a reference downstream probably want to make it a reference here.

MaskRay added inline comments.Mar 13 2020, 9:18 AM

After D75713, 00000000 <foo>:

You can now delete Disassembly of section and CHECK-EMPTY:

MaskRay accepted this revision.EditedMar 13 2020, 9:24 AM

In the git description, it may be worth mentioning that ET_EXEC/ET_DYN are not supported yet.

42 ↗(On Diff #249572)

Delete excess braces.

ostannard marked 3 inline comments as done.Mar 16 2020, 3:39 AM
This revision was automatically updated to reflect the committed changes.

I forgot to restrict the new tests to only run when the relevant disassembler is built, should be fixed by rG161f70eae6cd and rG2878c6693875.

The tests are also failing on some windows bots with unicode errors, so I've disabled them on windows with rGf4cb9c919e28. These tests were passing locally in a Windows 10 VM, I'm not sure what is different about the buildbots.

I believe this breaks llvm/test/MC/ARM/lsl-zero.s on Windows.

The failure started here: This build has many changes, but this is the only arm-related one as far as I can tell.

It's still happening on trunk:

Please take a look, and if it takes a while to investigate, please revert while you look.

My reply by email doesn't seem to have made it into phab, so repeating here (sorry for spamming everyone who did get the email):

I'm not sure how this patch could have caused that failure. This isn't really an arm-related patch, that's just the architecture I happened to write most of the tests using.

Do you have access to that buildbot to get the actual output from the llvm-mc command?

Reverted (together with a surprisingly long list of related changes) for now in 9e48422035813b79088dfa4637e31cd836332a54.

When you reland, please leave some time between commits, so that if one should break something, it's clearer which commit is the problem. Most bots cycle time-based, not commit-based, so things that land in quick succession tend to end up in the same build, which makes things confusing.

Sorry, I didn't see that email. The revert did get that bot green, so it was due to this change.

I don't have easy access to that bot, no.

(And sorry, I started the revert seconds before you must've replied, so I didn't see your reply until after I had committed.)

hoyFB added a subscriber: hoyFB.Mar 16 2020, 12:20 PM
hoyFB added inline comments.
108 ↗(On Diff #250510)

Hello, looks like this change breaks the clang test: clang/test/Analysis/checker-plugins.c by changing the printing format


example.MyChecker:ExampleOption                   (bool) This is an example checker opt. (default: false)


example.MyChecker:ExampleOption                   (bool) This is an example checker opt. (default:

and the test fails at

// CHECK-CHECKER-OPTION-HELP: example.MyChecker:ExampleOption  (bool) This is an
// CHECK-CHECKER-OPTION-HELP-SAME: example checker opt. (default: false)

Can you please take a look? Thanks!

benlangmuir added inline comments.
38 ↗(On Diff #250510)

While it is probablye reasonable to expect only well-formed UTF-8 during the top-level call to write, this will fail for a buffered raw_ostream if you write a multi-byte sequence that happens to crosses the buffer end point. raw_ostream's buffering is byte-level, and won't prevent splitting a multibyte character. We saw this assertion failure in practice.

@thakis, thanks for reverting, sorry I wasn't more proactive about that yesterday.

@hoyFB, it looks like I've accidentally fixed a bug where the help text wasn't being wrapped when the stdout isn't a TTY. I don't see anything in the code to suggest that this is intended behaviour, so I'll update the test.

@benlangmuir, thanks for the pointer, it looks like we'll need to save the first few bytes of truncated UTF-8 sequences when this happens so that we can calculate the display width once we have the rest of the bytes.

I'll split the formatted_raw_ostream changes out into a separate patch, and add some unit tests for these edge cases. I've still got no idea what the lsl-zero.s failure is about, but splitting the patch up should at least narrow it down.

See D76291 for that split-put patch.

MaskRay reopened this revision.EditedJul 8 2020, 3:35 PM

I think this is still in a reverted state (9e48422035813b79088dfa4637e31cd836332a54 ). --debug-vars is useful and I'd like to see it come back. Adding @dblaikie for some opinions.

This revision is now accepted and ready to land.Jul 8 2020, 3:35 PM

This was blocked on D76291 for a while, but that is now committed, so I'll re-land this today.

This revision was automatically updated to reflect the committed changes.