Page MenuHomePhabricator

[llvm-dwarfdump] --show-sources option to show all sources
Needs RevisionPublic

Authored by phosek on Sep 14 2020, 5:36 PM.

Details

Summary

This option allows printing all sources used by an object file.

Diff Detail

Event Timeline

phosek created this revision.Sep 14 2020, 5:36 PM
Herald added a project: Restricted Project. · View Herald Transcript
phosek requested review of this revision.Sep 14 2020, 5:36 PM

While this information can be extracted out of the existing llvm-dwarfdump output, it requires additional post-processing. This became so common in our project that we have implemented a custom Go-based tool for that purpose, but that has other downsides such as the lack of DWARF5 support. I think that supporting this option directly in llvm-dwarfdump might be generally useful and it doesn't add a lot of complexity. I'm open to suggestions for how to improve the output.

dblaikie accepted this revision.Sep 14 2020, 7:05 PM

Sounds alright - few things could be simplified, etc.

llvm/test/tools/llvm-dwarfdump/X86/sources.s
13–59

Maybe simplify the functions (to something like simple void/do-nothing functions) to reduce the length of the assembly, no need for types.

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
462

Could use std::move(FullPath) here, if you like, but hardly critical.

465

Could use llvm::sort here

This revision is now accepted and ready to land.Sep 14 2020, 7:05 PM
phosek updated this revision to Diff 291833.Sep 15 2020, 2:04 AM
phosek marked 3 inline comments as done.
jhenderson requested changes to this revision.Sep 15 2020, 2:55 AM
jhenderson added a subscriber: Higuoxing.

Please add the new option to the Command Guide documentation.

llvm/test/tools/llvm-dwarfdump/X86/sources.s
8

You can probably dramatically simplify this code by changing to use yaml2obj. I believe that ELF yaml2obj DWARF support is sufficiently powerful now to achieve this. @Higuoxing may be able to provide more information on this, as he did the work recently there. See also llvm/test/tools/yaml2obj/ELF/DWARF/debug-line.yaml for an example input.

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

I think we need testing for multiple CUs. The current test only checks a single one. This might go against the yaml2obj usage suggested above though (@Higuoxing, is there support for multiple tables in .debug_line yet?).

456

Not that it likely is going to matter in any practical situation, but this should probably be uint64_t technically - the FileNames are set via LEB128 values (see e.g. DW_LNS_set_file) and thus technically have no upper bound in size from the file format. I won't fight too hard for this if you don't want to though.

467
711

Not related to this patch, or even something you should do yourself. More idle musing - as llvm-dwarfdump starts gaining moreof these options, it feels like it should be able to do multiple at once (e.g. allow llvm-dwarfdump --show-sources --show-section-sizes).

This revision now requires changes to proceed.Sep 15 2020, 2:55 AM
Higuoxing added inline comments.Sep 15 2020, 11:36 PM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

is there support for multiple tables in .debug_line yet?

Yes, yaml2obj supports emitting multiple line tables. I'm able to help craft these test cases.


It looks that LT isn't checked. If a compilation unit doesn't have an associated line table, llvm-dwarfdump --show-sources will crash.

const auto *LT = DICtx.getLineTableForUnit(CU.get()); // Can be a null pointer.
for (uint32_t I = 1; I <= LT->Prologue.FileNames.size(); ++I) {
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ...
}

We can reproduce it using the following test case.

$ yaml2obj %s | llvm-dwarfdump --show-sources -
--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_EXEC
  Machine: EM_X86_64
DWARF:
  debug_info:
    - Version: 4
jhenderson added inline comments.Sep 16 2020, 1:31 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

Nice catch! In fact, do we really need to use the CUs at all for this? Could we not just iterate over all line tables? That would allow this to work when there is no .debug_info data too (which the DWARF spec implies is permitted).

probinson added inline comments.
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

I don't know how carefully the spec says it is permitted, but certainly I've heard committee members talk about stripping everything but .debug_line (and with v5, .debug_line_str) from an object file.

In DWARF v4, technically the primary source file & compilation dir could be omitted from the line table, although in practice I think that never happens. In v5 the primary source file & dir are supposed to be explicit in the line table, so I think ignoring .debug_info ought to be okay in general.

phosek marked 2 inline comments as done.Oct 9 2020, 12:57 AM
phosek added inline comments.
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

@jhenderson is there an API to iterate over all line tables? I searched through LLVM but haven't found anything.

jhenderson added inline comments.Oct 9 2020, 1:28 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

I thought there was, but having taking a look, I don't know of an interface that allows you to simply iterate over all line tables without parsing all of them.

Certainly you can iterate over all the line tables by parsing them in order by using the SectionParser class of DWARFDebugLine.h. I'm not sure if that's exactly the right way forward here though, since I suspect by this point the DWARFContext may have already done (some of) the parsing (I haven't dug into the code to confirm either way).

There's also getOrParseLineTable, which takes an offset, Context and DWARFDataExtractor and gives you back the line table at that offset, which may or may not have already been parsed (it will return the cached version if it has been). You'd need to then use the length field within the line table to identify the next offset to use. Maybe a new function could sit on top of that to give you the ability to iterate over them, and only parse the ones that haven't been already? Alternatively, you could modify the SectionParser class to cache the parsed line tables so that it doesn't matter if you try to reparse them later.

Higuoxing added inline comments.Oct 13 2020, 1:30 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

I think you are able to iterate over line tables via the following code snippets.

DWARFDataExtractor LineData(DICtx.getDWARFObj(),
                            DICtx.getDWARFObj().getLineSection(),
                            DICtx.isLittleEndian(), 0);
DWARFDebugLine::SectionParser Parser(LineData, DICtx,
                                     DICtx.normal_units());
while (!Parser.done()) {
  DWARFDebugLine::LineTable LT = Parser.parseNext(
    RecoverableErrorHandler,
    UnrecoverableErrorHandler);
  // Dump file names with paths.
  ...
}
456

I'm not sure if the for-loop should start from 0. The DWARFv5 spec says:

In DWARF Version 5, the current compilation file name is explicitly present and has index 0.

phosek added inline comments.Oct 16 2020, 12:56 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

Thanks, I tried that which made me realize that without CU, we don't have the comp_dir, is that something we care about?

Higuoxing added inline comments.Oct 16 2020, 4:06 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

I have no idea. Perhaps @jhenderson and @dblaikie can help us?

jhenderson added inline comments.Oct 19 2020, 2:25 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

Ah, that's a good point. I think having the compilation directory is useful, but perhaps not a deal breaker. In other words, if it would be clean to do, I'd think the behaviour could be:

  1. If .debug_line only is present, print just the names assuming some reasonable assumption about the compilation dir (e.g. the working directory/empty string/"." etc).
  2. If both are present, use the one specified in the CU.

I'm very much open to other thoughts though. I feel like this option could be useful without .debug_info being present, but I don't know how much of a common case that actually is.

dblaikie added inline comments.Oct 19 2020, 11:34 PM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
455

The question is how to iterate over line tables (rather than over CUs and retrieving their line tables)? But you don't want all the parsed data, just the file and line tables?

Yeah, looks like the nearest tool available is DWARFDebugLine::SectionParser but, as noted it does seem to do all the parsing up-front. I wouldn't be averse to/would generally encourage refactorings that make APIs like this lazier - parses maybe just a bit of the line table header, then returns - then you can query it for files, directories, and line table entries as desired. Those queries can fail, of course (since parsing hasn't been done up-front), so the query APIs should reflect that possibility.

Such refactoring can/should be done separately, with new test cases added - perhaps using unit tests, where, say, a line table with a valid directory list exists, but with invalid data after that - by lazy parsing, it should be possible to query just the directory table without ever reaching the invalid data/getting any errors. Similarly - the ability to minimal-parse one line table and jump to the next one immediately - skipping over some invalidity in the first line table without errors because it's never queried in detail, etc.

I made some fixes along these lines to loclist and rnglist parsing in the last week or so for a variety of reasons, for instance.