Page MenuHomePhabricator

[llvm-dwarfdump] Add the --show-sections-sizes option
Needs ReviewPublic

Authored by djtodoro on Fri, Feb 7, 3:08 AM.

Details

Summary

Add an option to llvm-dwarfdump to calculate the bytes within .debug* sections.

Diff Detail

Event Timeline

djtodoro created this revision.Fri, Feb 7, 3:08 AM
Herald added a project: Restricted Project. · View Herald Transcript

I found it useful when testing some large patches, since the llvm-size does not calculate it, I usually do it manually or by some scripting.

An example of the output:

llvm-dwarfdump --show-sections-sizes test.o
----------------------------------------------------
test.o: file format ELF64-x86-64
----------------------------------------------------
  SECTION 			  SIZE
------------ 		  -----------------------
 .debug_info 		  132  (19.67%)
 .debug_str 		  178  (26.53%)
 .debug_abbrev 		  120  (17.88%)
 .debug_line 		  74  (11.03%)
 .debug_ranges 		  0  (0.00%)
 .debug_loc 		  89  (13.26%)

 .debug_*: 		  593  (88.38%)
----------------------------------------------------
 TOTAL FILE SIZE: 671
----------------------------------------------------

What is the purpose of this new option. Why would something like llvm-size or llvm-readobj --section-headers not be sufficient?

Please make sure to add any new switches to the documentation contained at llvm/docs/CommandGuide/llvm-dwarfdump.rst.

This seems to only calculate the sizes of a fairly limited selection of DWARF sections. Why not just do it for all .debug_* sections?

llvm/tools/llvm-dwarfdump/SectionsSizes.cpp
1 ↗(On Diff #243119)

I'd rename this file to "SectionSizes.cpp"

19 ↗(On Diff #243119)

SectionsSizes -> SectionSizes

21 ↗(On Diff #243119)

Nit: missing space after comma.

148 ↗(On Diff #243119)

This will just print "error: ", which is not useful. Have calculateSizes and this function return an llvm::Error and use that to print a more useful error message much further up the stack by using the llvm-dwarfdump.cpp error function.

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
213

of the .debug_* -> of all .debug_*

I found it useful when testing some large patches, since the llvm-size does not calculate it, I usually do it manually or by some scripting.

An example of the output:

llvm-dwarfdump --show-sections-sizes test.o
----------------------------------------------------
test.o: file format ELF64-x86-64
----------------------------------------------------
  SECTION 			  SIZE
------------ 		  -----------------------
 .debug_info 		  132  (19.67%)
 .debug_str 		  178  (26.53%)
 .debug_abbrev 		  120  (17.88%)
 .debug_line 		  74  (11.03%)
 .debug_ranges 		  0  (0.00%)
 .debug_loc 		  89  (13.26%)

 .debug_*: 		  593  (88.38%)
----------------------------------------------------
 TOTAL FILE SIZE: 671
----------------------------------------------------

Right, sorry, you posted this as I wrote up my comments!

@jhenderson Thanks for the quick feedback!

I found this pretty-printing output very useful to me, and adding such thing within the llvm-dwarfdump made sense to me. Therefore, I feel like sharing this and see if this can be useful to others.

djtodoro added a comment.EditedFri, Feb 7, 4:19 AM

This seems to only calculate the sizes of a fairly limited selection of DWARF sections. Why not just do it for all .debug_* sections?

I'll add them all.

djtodoro updated this revision to Diff 243183.Fri, Feb 7, 8:58 AM

-Addressing comments
-Adding the D74208 part

jdoerfert resigned from this revision.Sat, Feb 8, 11:02 AM
jhenderson added inline comments.Mon, Feb 10, 2:15 AM
llvm/docs/CommandGuide/llvm-dwarfdump.rst
117

Options are documented in alphabetical order, so this should be above --statistics.

llvm/test/tools/llvm-dwarfdump/X86/show_section_sizes.s
2

As this test doesn't do any analysis it would make more sense to use yaml2obj to create sections of an appropriate size instead of llvm-mc on a bunch of hard-to-read assembly. You can see examples in something like "llvm\test\tools\llvm-readobj\ELF\sections.test".

3

No need for the '#' character on empty lines (here and at the bottom of the checks).

llvm/tools/llvm-dwarfdump/SectionSizes.cpp
145

I think this can be made much more general, robust, and future-proof by doing what llvm-objcopy does and simply checking that name starts with ".debug" and other related strings (which you may or may not wish to include). See isDebugSection in ELFObjcopy.cpp.

162

I'd report a warning or error here, rather than just throwing away the error.

168–169

Why are you limiting this totalling up to a limited subset of sections? I'd expect this code to either a) return the total size of all sections, or the total object size. Note that not all sections are one of the four categories your code mentions, nor does this include other file data like an ELF header or the section header table itself.

Also, do you really want to include BSS size in this total? That takes up no disk footprint usually, only memory (which debug sections do not take up).

175

Rather than this big long block, how about a simpler approach as shown by the following pseudo-code:

// std::map rather than std::unordered_map, so that the names appear alphabetically
std::map<StringRef, uin64_t> DebugSectionSizes;
for(Section S : Sections) {
  StringRef Name = S.getName();
  if (!isDebugSection(Name))
    continue;
  StringRef Key;
  if (Name.startswith(".debug"))
    Key = Name.substr(6);
  // other cases for other possible prefixes, e.g. .zdebug
  DebugSectionSizes[Key] += S.size();
}
uint64_t TotalDebugSize = 0;
// Print some header data
for (auto KV : DebugSectionSizes) {
  StringRef Name = KV.first;
  StringRef Size = KV.second;
  TotalDebugSize += Size;
  // Print .debug_ + Name, Size
}
// Print total sizes

This will allow it to be simpler, and future-proof against new debug sections. The only possible downside would be that it would only print the sizes of sections that exist. If this is an issue, pre-populate the map with appropriate sections, i.e. something like:

std::unordered_map<StringRef, uin64_t> DebugSectionSizes = {
  {"_info", 0}, {"_types", 0}, // etc
};

The problem with that idea is that it assumes you know all debug section names that both currently exist and could exist in the future. I don't think this is viable, so I think populating the table dynamically would be better.

djtodoro marked 3 inline comments as done.EditedMon, Feb 10, 3:30 AM

@jhenderson Thanks a lot for your comments! I'll try to address them all, and remove the WIP from the title.

llvm/tools/llvm-dwarfdump/SectionSizes.cpp
145

Thanks!

168–169

Why are you limiting this totalling up to a limited subset of sections? I'd expect this code to either a) return the total size of all sections, or the total object size. Note that not all sections are one of the four categories your code mentions, nor does this include other file data like an ELF header or the section header table itself.

I saw that llvm-size considers as 'total', but I agree this should be revisited. Thanks!

Also, do you really want to include BSS size in this total? That takes up no disk footprint usually, only memory (which debug sections do not take up).

I see.. Anyhow, better option might be to using the "total object size".

175

I put it as WIP, and did not have much time to refactor the code, but I will do that and try to remove the WIP marker.
Thanks a lot for the suggestion!

aprantl added inline comments.Mon, Feb 10, 2:57 PM
llvm/docs/CommandGuide/llvm-dwarfdump.rst
117

classic dwarfdump on macOS used to have a similar functionality under the name of

--file-stats[=size]
        Show file composition statistics for any input files. Each file's
        contents are analyzed and broken down into byte counts for the
        following categories: symbol table, string table, text and code,
        DWARF debug information, STABS debug information, and other. When
        multiple files are specified, byte count totals for each category
        will be displayed at the end of the table.  Specify the optional
        'size' argument to show all byte count results using unit suffixes:
        Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte. This
        option can be useful in tracking the size and makeup of mach-o
        binary files, and also allows easy comparison between DWARF and
        STABS built binaries and related object files.

that we never ported to llvm-dwarfdump. We *could* reimplement the same interface for familiarity, but I don't have a particularly strong opinion about it.

jhenderson added inline comments.Tue, Feb 11, 1:05 AM
llvm/docs/CommandGuide/llvm-dwarfdump.rst
117

I'm neutral on whether the switch name should be changed to match, but I don't think we need to go into the granularity of the classic dwarfdump - things like symbol table, string table, text etc sizes are best retrieved from tools like llvm-readobj and llvm-size. I think we should keep this to debug section sizes (with total file size information also included for ease of comparison). The suffixes idea is not bad though - perhaps worth a future follow-up.

Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte

Why no Exabyte? 😆

djtodoro marked an inline comment as done.Tue, Feb 11, 2:22 AM
djtodoro added inline comments.
llvm/docs/CommandGuide/llvm-dwarfdump.rst
117

I'm neutral on whether the switch name should be changed to match, but I don't think we need to go into the granularity of the classic dwarfdump - things like symbol table, string table, text etc sizes are best retrieved from tools like llvm-readobj and llvm-size. I think we should keep this to debug section sizes (with total file size information also included for ease of comparison). The suffixes idea is not bad though - perhaps worth a future follow-up.

+1
I would keep it only for the DWARF debug information.

When we implement this (initially), we can play with additional metrics, formats, etc.

djtodoro updated this revision to Diff 243823.Tue, Feb 11, 5:05 AM
djtodoro retitled this revision from [WIP][llvm-dwarfdump] Add the --show-sections-sizes option to [llvm-dwarfdump] Add the --show-sections-sizes option.

-Remove the WIP marker
-Improve the code
-Refactor the code
-Addressing comments

djtodoro edited the summary of this revision. (Show Details)Tue, Feb 11, 5:07 AM
djtodoro marked an inline comment as done.Tue, Feb 11, 5:09 AM
djtodoro added inline comments.
llvm/test/tools/llvm-dwarfdump/X86/show_section_sizes.s
2

I'll add such test as well.

djtodoro updated this revision to Diff 243830.Tue, Feb 11, 5:30 AM
  • Taking care about padding within SectionSizes
djtodoro updated this revision to Diff 243838.Tue, Feb 11, 5:59 AM

-Adding the yaml2obj test

jhenderson added inline comments.Wed, Feb 12, 2:13 AM
llvm/test/tools/llvm-dwarfdump/X86/section_sizes.test
1

I like to encourage people to use '##' for comments in tests these days, as they stand out better from the main lit and FileCheck directives.

Aside from that, I'd suggest some improvements to this test:

  1. Check the whole output, and also use FileCheck's --match-full-lines and --strict-whitespace to show that the formatting works.
  2. Have multiple of one of the sections, to show that the size is totalled up. It is possible to have multiple such sections (e.g. due to multiple .debug_types sections).
  3. Use different sizes for the sections, so that it clearly shows that the size is calculated correctly.
  4. Add a section that starts with ".debug" but doesn't have a "standard" dwarf name (e.g. .debugfoo or .debug_bar), and that it is still printed.
20

No need for the AddressAlign parameter here and below.

llvm/test/tools/llvm-dwarfdump/X86/show_section_sizes.s
2

I think you can get rid of this test entirely, in favour of the YAML-based test. There's no need for duplicate coverage.

llvm/tools/llvm-dwarfdump/SectionSizes.cpp
24

Perhaps: "Map of .debug section names and their sizes across all such-named sections."

32

I'd replace "TheSecs" with "Section" in the function name.

52

As mentioned before, the total size of all sections is not the same as the total file size. I'd recommend just doing Obj.getData().size() here.

68

errorToErrorCode is not a good way of reporting an llvm::Error as it loses a lot of context that might otherwise be reported. Also, perhaps this should only be a warning and then skipping that section, rather than an error which terminates the loop. After all, it might be possible to find the names of other sections. If you did that, this function could return void.

I'd recommend you pull this out into a function, and probably follow what reportWarning does in llvm-readobj.cpp.

101–105

I'm not sure this loop gives us anything. The number of CUs seems irrelevant to the task at hand.

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
650

It looks like you handle each file individually, which is fine, but it probably makes sense to include the file name somewhere in your output too, to clearly distinguish them. Also, as you can have multiple inputs, you should have a test for that (possibly it can be part of the existing test).