This is an archive of the discontinued LLVM Phabricator instance.

llvm-dva - Debug Information Visual Analizer
Needs ReviewPublic

Authored by CarlosAlbertoEnciso on Oct 1 2020, 8:06 AM.

Details

Reviewers
None
Group Reviewers
debug-info
Summary
NOTE: The purpose of this revision is to provide the LLVM community with a single patch, so llvm-dva can be built and try out right away, rather than review it. We are in the process of creating a reviewable series of patches, which we will start uploading after the conference.
Introduction

LLVM supports multiple debug information formats (namely DWARF and CodeView) in different binary formats (e.g. ELF, PDB, Mach-O). Understanding the mappings between source code and debug information can be complex, and it is a problem we have commonly encountered when triaging debug information issues.

The output from tools such as llvm-dwarfdump or llvm-readobj use a close representation of the internal debug information format and in our experience, we have found that they require a good knowledge of those formats to understand the output, limiting who can triage and address such issues quickly. Even for the experts, it can sometimes take a lot of time and effort to triage issues due to the inherent complexity.

llvm-dva

At Sony, we have been developing an LLVM-based debug information analysis tool which we have called llvm-dva (short for LLVM debug information visual analyzer), designed to visualize these mappings. It's based entirely on the existing LLVM libraries for debug info parsing, target support, etc. and at this stage we believe that its proven its worth internally to the point where we would like to propose upstreaming it as part of the mainline LLVM project alongside existing tools such as llvm-dwarfdump.

llvm-dva is a command line tool that process debug info contained in a binary file and produces a debug information format agnostic "Logical View", which is a high-level semantic representation of the debug info, independent of the low-level format.

The logical view is composed of the tradition programming elements as: scopes, types, symbols, lines. These elements can display additional information, such as variable coverage factor, lexical block level, disassembly code, code ranges, etc.

The diversity of llvm-dva command line options enables the creation of very rich logical views to include more low-level debug information: disassembly code associated with the debug lines, variables runtime location and coverage, internal offsets for the elements within the binary file, etc.

With llvm-dva, we aim to address the following points:

  • Which variables are dropped due to optimization?
  • Why I cannot stop at a particular line?
  • Which lines are associated to a specific code range?
  • Does the debug information represent the original source?
  • What is the semantic difference between the debug info generated by different toolchain versions?

Diff Detail

Event Timeline

CarlosAlbertoEnciso requested review of this revision.Oct 1 2020, 8:06 AM
Orlando added a subscriber: Orlando.Oct 1 2020, 8:12 AM
dstenb added a subscriber: dstenb.Oct 1 2020, 8:19 AM
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)Oct 1 2020, 8:24 AM
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)Oct 1 2020, 8:37 AM
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)

(If you use arc diff to upload a patch (https://llvm.org/docs/GettingStarted.html#sending-patches), reviewers can use arc patch D88661 to apply the patch locally.
This patch does not have a/ b/ prefixes (git format-patch or git diff) and does not apply cleanly)

MaskRay added inline comments.Oct 1 2020, 2:31 PM
llvm/include/llvm/ADT/IntervalTree.h
13

This interval tree is not memory/performance efficient (every node has a vector). An augmented binary search tree (red-black tree) is superior. If there is an existing red-black implementation, augmenting it will also have less code (https://github.com/radareorg/radare2/pull/8381)

This patch includes the a/ b/ prefixes.

(If you use arc diff to upload a patch (https://llvm.org/docs/GettingStarted.html#sending-patches), reviewers can use arc patch D88661 to apply the patch locally.
This patch does not have a/ b/ prefixes (git format-patch or git diff) and does not apply cleanly)

Uploaded a correct patch that includes the a/ b/ prefixes.

MaskRay added a comment.EditedOct 2 2020, 12:17 PM

(If you use arc diff to upload a patch (https://llvm.org/docs/GettingStarted.html#sending-patches), reviewers can use arc patch D88661 to apply the patch locally.
This patch does not have a/ b/ prefixes (git format-patch or git diff) and does not apply cleanly)

Uploaded a correct patch that includes the a/ b/ prefixes.

For CMake changes, it'd be good to test a -DBUILD_SHARED_LIBS=on configuration. Many unspecified dependency can be caught. It also pushes the author to think whether certain dependencies are needed.

FAILED: lib/libLLVMDebugInfoLogicalView.so.12git
...
ld.lld: error: undefined symbol: llvm::raw_ostream::write(char const*, unsigned long)
ld.lld: error: undefined symbol: llvm::Twine::str[abi:cxx11]() const

Some documented options appear to be unavailable.

% fllvm-diva --sort=offset /tmp/Debug/bin/clang       
llvm-diva: Unknown command line argument '--sort=offset'.  Try: '/tmp/RelA/bin/llvm-diva --help'
llvm-diva: Did you mean '--report=offset'?

The indentation seems to be too large.

% fllvm-diva --print=scopes /tmp/Debug/bin/clang             

Logical View:
           {File} '/tmp/debug/bin/clang'

             {CompileUnit} 'cc1_main.cpp.dwo'

             {CompileUnit} 'cc1as_main.cpp.dwo'

             {CompileUnit} 'cc1gen_reproducer_main.cpp.dwo'

             {CompileUnit} 'driver.cpp.dwo'

Are 'instructions', 'symbols', and 'types' unavailable?

% fllvm-diva --print=symbols /tmp/Debug/bin/clang

Logical View:
% fllvm-diva --print=types /tmp/Debug/bin/clang                                                                                                               

Logical View:

The output appears to be a bit confusing. If it just prints one line number each line for each line table entry, the output can be conciser. Annotating the source code (like gcov or llvm-cov) may be more readable.

% fllvm-diva --print=lines /tmp/Debug/bin/clang
                                       
Logical View:                          
           {File} '/tmp/debug/bin/clang'
                                       
             {CompileUnit} 'cc1_main.cpp.dwo'
   247         {Line}                                                          
   247         {Line}
     -         {Line}
     -         {Line}
    62         {Line}
    67         {Line}
    67         {Line}
    67         {Line}
    68         {Line}
    70         {Line}
    71         {Line}
    72         {Line}
    73         {Line}
    74         {Line}
    75         {Line}
    76         {Line}
    77         {Line}
    78         {Line}
    79         {Line}
MaskRay added inline comments.Oct 2 2020, 3:12 PM
llvm/include/llvm/DebugInfo/LogicalView/Core/LVBasicDefinitions.h
28

We usually use LLVM_DEBUG for debugging purposes

llvm/include/llvm/DebugInfo/LogicalView/Core/LVOptions.h
112

One // --attribute=all is probably sufficient. It can be easily inferred what other values are accepted.

llvm/lib/DebugInfo/LogicalView/Core/LVSupport.cpp
43
llvm/tools/llvm-diva/Options.h
23 ↗(On Diff #295756)

using namespace is discouraged in header files.

@MaskRay Thanks very much for your valuable comments.

For CMake changes, it'd be good to test a -DBUILD_SHARED_LIBS=on configuration. Many unspecified dependency can be caught. It also pushes the author to think whether certain dependencies are needed.

FAILED: lib/libLLVMDebugInfoLogicalView.so.12git
...
ld.lld: error: undefined symbol: llvm::raw_ostream::write(char const*, unsigned long)
ld.lld: error: undefined symbol: llvm::Twine::str[abi:cxx11]() const

Updated the CMake configuration files to support -DBUILD_SHARED_LIBS=ON

Some documented options appear to be unavailable.

% fllvm-diva --sort=offset /tmp/Debug/bin/clang       
llvm-diva: Unknown command line argument '--sort=offset'.  Try: '/tmp/RelA/bin/llvm-diva --help'
llvm-diva: Did you mean '--report=offset'?

The option --sort=offset in the documentation example is incorrect. It should read:

llvm-diva --output-sort=offset --attribute=level --print=scopes,symbols,types,lines,instructions test.o

The documentation have been updated.

% fllvm-diva --print=scopes /tmp/Debug/bin/clang             

Logical View:
           {File} '/tmp/debug/bin/clang'

             {CompileUnit} 'cc1_main.cpp.dwo'

             {CompileUnit} 'cc1as_main.cpp.dwo'

             {CompileUnit} 'cc1gen_reproducer_main.cpp.dwo'

             {CompileUnit} 'driver.cpp.dwo'

Are 'instructions', 'symbols', and 'types' unavailable?

% fllvm-diva --print=symbols /tmp/Debug/bin/clang

Logical View:
% fllvm-diva --print=types /tmp/Debug/bin/clang                                                                                                               

Logical View:

It seems we overlooked support for Split DWARF. In the meantime I would suggest to use '-g', until Split DWARF is added.

Uploaded a correct patch that includes:

  • support for -DBUILD_SHARED_LIBS=ON
  • Correct command line option in example (documentation).
CarlosAlbertoEnciso updated this revision to Diff 296899.EditedOct 8 2020, 2:36 AM

Uploaded an updated patch:

  • Remove using namespace from headers files. Only one left (to be removed at a later stage).
  • Fix conflict in ProgrammersManual.rst.
CarlosAlbertoEnciso retitled this revision from llvm-diva - Debug Information Visual Analizer to llvm-dva - Debug Information Visual Analizer.
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)
CarlosAlbertoEnciso added a subscriber: echristo.

Following suggestions from @echristo in relation with the tool name, it was decided to renamed it as llvm-dva.

This patch mainly addresses:

  • The tool name renaming.
  • Added support for Split DWARF.

@MaskRay: We have added support for Split DWARF. We would appreciate, if you can try again llvm-dva with the binaries you used in your test cases. Thanks.

Uploaded a patch that corrects a merge conflict.

Uploaded a patch that builds with the current TOT.

Is this tool still planned to be upstreamed? Is this patch the most up-to-date version of this tool, or there is another repo? How is this CL related to https://github.com/SNSystems/DIVA?

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 4:22 PM

@aheejin Thanks for your interest.

Is this tool still planned to be upstreamed?

The short answer is yes.

Is this patch the most up-to-date version of this tool, or there is another repo?

This patch is not up-to-date. There is no another repo.

We (at Sony) are ready to upload the final patches (series) for a formal review. The patch series will be uploaded next week and they are:

  1. Interval tree
  2. Driver and documentation
  3. Logical elements
  4. Locations and ranges
  5. Select elements
  6. Warning and internal options
  7. Compare elements
  8. ELF Reader
  9. CodeView Reader

How is this CL related to https://github.com/SNSystems/DIVA?

DIVA was the original tool we developed. Currently is not maintained.

llvm-dva is a complete redesign of DIVA.

Once the patch series are uploaded, we will notify the community.

@CarlosAlbertoEnciso That's great news! Thank you for letting me know.

@CarlosAlbertoEnciso That's great news! Thank you for letting me know.

@aheejin We have uploaded the RFC and patches for review.

https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570/3

I am happy to answer any questions.

This is exciting!
I understand where it comes from, but I wanted to at least ask if it would make sense to give the tool a more descriptive and discoverable name than llvm-dva such as llvm-debuginfo-analyzer. I understand there's a tradeoff between having a short command and a descriptive name.

This is exciting!
I understand where it comes from, but I wanted to at least ask if it would make sense to give the tool a more descriptive and discoverable name than llvm-dva such as llvm-debuginfo-analyzer. I understand there's a tradeoff between having a short command and a descriptive name.

@aprantl: thanks for your message.

You have a very good point and your proposed name llvm-debuginfo-analyzer gives more information about what the tool does.

May be we can add your suggestion to the current RFC:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570/4

This is exciting!
I understand where it comes from, but I wanted to at least ask if it would make sense to give the tool a more descriptive and discoverable name than llvm-dva such as llvm-debuginfo-analyzer. I understand there's a tradeoff between having a short command and a descriptive name.

@aprantl, I have updated the current RFC to include your suggested name.
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570/7?u=carlosalbertoenciso

llvm-debuginfo-analyzer SN Systems (Sony Interactive Entertainment) GitHub repository:
https://github.com/SNSystems/llvm-debuginfo-analyzer