This adds the -debug-vars option to llvm-objdump, which prints locations (currently just registers, memory and more complex expressions will be added later) of source-level variables alongside the disassembly based on DWARF info. A vertical line is printed for each live-range, with a label at the top giving the variable name and location, and the position and length of the line indicating the program counter range in which it is valid.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h | ||
---|---|---|
144 | printCompact maybe? | |
llvm/include/llvm/Support/FormattedStream.h | ||
110 | Comment what ComputePosition is doing here? | |
llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp | ||
356 | If you wouldn't mind commenting each of the cases with what's going on in them it'd be appreciated. | |
llvm/tools/llvm-objdump/llvm-objdump.cpp | ||
600–602 | Since this is going to get constructed with subsequent boolean arguments can we move them to enums/something? int, true, false, true isn't real obvious what it's constructing :) | |
717 | Not a huge fan of boolean arguments. Is there a different factoring here that'll help? |
llvm/docs/CommandGuide/llvm-objdump.rst | ||
---|---|---|
132–135 |
I like this idea. Saves the need for two switches, and makes things nice and explicit about what encoding to use. |
Needs a rebase after D74507. --debug-vars seems to work with an ET_REL, but does it work with an ET_DYN/ET_DYN?
llvm/tools/llvm-objdump/llvm-objdump.cpp | ||
---|---|---|
566 | // Internal classes should not use Doxygen comments. | |
662 | Ideally unsigned TabStop = NoShowRawInsn ? 16 : 40 in printInst should be extracted as a function. Is --no-leading-addr taken into account? | |
709 | addCompileUnit Looks like this function can be merged with addFunction(). | |
717 | No need to use /// . These classes are internal. | |
843 | Change OS << " " to print one space unconditionally can simplify code above. | |
868 | Change OS << " " to OS << ' ' unconditionally can avoid " " above. | |
1806 | Does this harm performance when --debug-vars is not used? I once caused a regression (fixed by D64969). |
does it work with an ET_DYN/ET_DYN?
Not currently, I'll investigate and fix that as a separate patch.
llvm/tools/llvm-objdump/llvm-objdump.cpp | ||
---|---|---|
566 | @aprantl previously asked for these to be changed to doxygen. I don't see anything explicit about this in the coding standard, but apparently IDEs can make use of these. | |
600–602 | This is only ever constructed with these all set to false, so I can remove them from the constructor, and use the member names explicitly when modifying them. | |
662 | --no-leading-addr isn't taken into account, because it doesn't affect the column the instruction starts at. | |
709 | I don't see how, given that it calls addFunction in two places. | |
717 | It's not easy to factor out, because CheckedVarIdxs is defined in the first half then used in the second. However, I can rename it to better describe what it does. | |
1806 | If I've understood that discussion properly, the performance problem was caused by creating a formatted_raw_ostream for each instruction, because it flushes the underlying stream in the constructor. Here, i'm creating one per disassembled function, so the overhead will be much lower. |
llvm/include/llvm/Support/FormattedStream.h | ||
---|---|---|
108–112 | The function name self documents. The comment may be unnecessary. | |
llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp | ||
351 | There is certainly prior art. But if changing all SmallString<20> to SmallString<16> may decrease the code size? | |
llvm/lib/Support/FormattedStream.cpp | ||
31–32 | Ptr < End will be safer in -DLLVM_ENABLE_ASSERTIONS=off builds in case of a malformed UTF-8 code sequence. | |
32–50 | This is incorrect due to East Asian Width.... http://www.unicode.org/reports/tr11/tr11-36.html % cat a.cc int main() { int 喵🙂 = 0; int 喵喵🙂 = 15; int 喵🙂喵🙂 = 24; 喵🙂 = 28; 喵喵🙂 = 32; } % clang -c a.cc % llvm-objdump -S --no-show-raw-insn --debug-vars a.o a.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 main: ; int main() { ┠─ 喵🙂 = <unknown op DW_OP_fbreg (145)> ┃ ┠─ 喵喵🙂 = <unknown op DW_OP_fbreg (145)> ┃ ┃ ┠─ 喵🙂喵🙂 = <unknown op DW_OP_fbreg (145)> 0: pushq %rbp ┃ ┃ ┃ 1: movq %rsp, %rbp ┃ ┃ ┃ 4: xorl %eax, %eax ┃ ┃ ┃ ; int 喵🙂 = 0; ┃ ┃ ┃ 6: movl $0, -4(%rbp) ┃ ┃ ┃ ; int 喵喵🙂 = 15; ┃ ┃ ┃ d: movl $15, -8(%rbp) ┃ ┃ ┃ ; int 喵🙂喵🙂 = 24; ┃ ┃ ┃ 14: movl $24, -12(%rbp) ┃ ┃ ┃ ; 喵🙂 = 28; ┃ ┃ ┃ 1b: movl $28, -4(%rbp) ┃ ┃ ┃ ; 喵喵🙂 = 32; ┃ ┃ ┃ 22: movl $32, -8(%rbp) ┃ ┃ ┃ ; } ┃ ┃ ┃ 29: popq %rbp ┃ ┃ ┃ 2a: retq ┻ ┻ ┻ | |
35 | The following might be slight simpler. auto C = static_cast<unsigned char>(*Ptr); if (C < 0x80) MultiByte = false; else if (C >= 0b11110000) Ptr += 3; else if (C >= 0b11100000) Ptr += 2; else Ptr++; |
llvm/lib/Support/FormattedStream.cpp | ||
---|---|---|
32–50 | There's some code in the Support library which does this, which also simplifies this function a bit. It handles east asian characters correctly (at least the ones in your example), but doesn't get the correct widths for emojis. Clang seems to handle emojis by printing a hex value which it knows the length of: $ /work/llvm/build/bin/clang -fsyntax-only -c width.c width.c:1:15: error: use of undeclared identifier 'a' int 喵<U+1F642> = a; ^ 1 error generated. Fixing that is going to be a lot of work, and isn't related to this patch, so I'll just make it work for characters supported by the existing unicode support library for now. |
- Align columns correctly after east asian wide characters (though not emojis)
- Other style changes
llvm/test/tools/llvm-objdump/ARM/debug-vars-dwarf4.s | ||
---|---|---|
92 | The SRC_COMPDIR sed script isn't used for the LINE-NUMS test (if it was we'd then need to regex out the whole path), so I've put this regex back. | |
llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s | ||
11 | It turns out that this doesn't work, at least in my windows 10 VM, so I'll disable this test for windows. |
- Disable wide-chars test on windows
- Fix the line-numbers test, which was broken on windows (different dir separator)
llvm/test/tools/llvm-objdump/ARM/debug-vars-wide-chars.s | ||
---|---|---|
8 | I was using "han character" meaning a Chinese, Japanese or Korean character, but it seems like "Chinese character" is a more common term, so I'll switch to that. I'd rather not use the character itself in a comment explaining that it might not render correctly on some systems! |
Couple of random comments, but feel free to fix and commit. I think anything else can be handled post-review.
Thanks!
-eric
llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp | ||
---|---|---|
358 | You assume MRI isn't null so probably just make it a reference. | |
408 | Since MRI is going to be a reference downstream probably want to make it a reference here. |
In the git description, it may be worth mentioning that ET_EXEC/ET_DYN are not supported yet.
llvm/lib/Support/FormattedStream.cpp | ||
---|---|---|
42 | Delete excess braces. |
I forgot to restrict the new tests to only run when the relevant disassembler is built, should be fixed by rG161f70eae6cd and rG2878c6693875.
The tests are also failing on some windows bots with unicode errors, so I've disabled them on windows with rGf4cb9c919e28. These tests were passing locally in a Windows 10 VM, I'm not sure what is different about the buildbots.
I believe this breaks llvm/test/MC/ARM/lsl-zero.s on Windows.
The failure started here: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/14810 This build has many changes, but this is the only arm-related one as far as I can tell.
It's still happening on trunk: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/14816/steps/stage%201%20check/logs/stdio
Please take a look, and if it takes a while to investigate, please revert while you look.
My reply by email doesn't seem to have made it into phab, so repeating here (sorry for spamming everyone who did get the email):
I'm not sure how this patch could have caused that failure. This isn't really an arm-related patch, that's just the architecture I happened to write most of the tests using.
Do you have access to that buildbot to get the actual output from the llvm-mc command?
Reverted (together with a surprisingly long list of related changes) for now in 9e48422035813b79088dfa4637e31cd836332a54.
When you reland, please leave some time between commits, so that if one should break something, it's clearer which commit is the problem. Most bots cycle time-based, not commit-based, so things that land in quick succession tend to end up in the same build, which makes things confusing.
Sorry, I didn't see that email. The revert did get that bot green, so it was due to this change.
I don't have easy access to that bot, no.
(And sorry, I started the revert seconds before you must've replied, so I didn't see your reply until after I had committed.)
llvm/include/llvm/Support/FormattedStream.h | ||
---|---|---|
108–112 | Hello, looks like this change breaks the clang test: clang/test/Analysis/checker-plugins.c by changing the printing format from: example.MyChecker:ExampleOption (bool) This is an example checker opt. (default: false) to: example.MyChecker:ExampleOption (bool) This is an example checker opt. (default: false) and the test fails at // CHECK-CHECKER-OPTION-HELP: example.MyChecker:ExampleOption (bool) This is an // CHECK-CHECKER-OPTION-HELP-SAME: example checker opt. (default: false) Can you please take a look? Thanks! |
llvm/lib/Support/FormattedStream.cpp | ||
---|---|---|
38 | While it is probablye reasonable to expect only well-formed UTF-8 during the top-level call to write, this will fail for a buffered raw_ostream if you write a multi-byte sequence that happens to crosses the buffer end point. raw_ostream's buffering is byte-level, and won't prevent splitting a multibyte character. We saw this assertion failure in practice. |
@thakis, thanks for reverting, sorry I wasn't more proactive about that yesterday.
@hoyFB, it looks like I've accidentally fixed a bug where the help text wasn't being wrapped when the stdout isn't a TTY. I don't see anything in the code to suggest that this is intended behaviour, so I'll update the test.
@benlangmuir, thanks for the pointer, it looks like we'll need to save the first few bytes of truncated UTF-8 sequences when this happens so that we can calculate the display width once we have the rest of the bytes.
I'll split the formatted_raw_ostream changes out into a separate patch, and add some unit tests for these edge cases. I've still got no idea what the lsl-zero.s failure is about, but splitting the patch up should at least narrow it down.
I think this is still in a reverted state (9e48422035813b79088dfa4637e31cd836332a54 ). --debug-vars is useful and I'd like to see it come back. Adding @dblaikie for some opinions.
This was blocked on D76291 for a while, but that is now committed, so I'll re-land this today.
I just tried this out locally, and on an already-wide window, a fairly basic expression wrapped with the default of 50 characters. Any particular reason you've used 50?