Add an option to llvm-dwarfdump to calculate the bytes within .debug* sections.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 34 | WithColor.h has recently gained defaultWarningHandler which allows printing of errors nicely, without the extra effort here. Also, there is a createFileError method that takes an Error. Consequently, you can write this function simply as: WithColor::defaultWarningHandler(createFileError(Filename, std::move(Err)); You could even consider doing away with this function then, by doing this inline, but it's up to you on that one. I'm happy either way. | |
| 76 | Here and throughout your code, use const Twine &. From the Twine.h comments: "Twines should only be used accepted as const references in arguments," | |
| 98 | collectSecsSizesForObjectFile -> collectObjectSectionSizes | |
| 106 | Is the file format really important? I personally don't think it gives us anything. | |
| llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
| 287 | This seems unnecessary. The only difference is removing the DWARFContext argument, right? I'd just pass in the DWARFContext and not use it to keep the interface identical. | |
@jhenderson Thanks for the review!
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #243838) | I am getting the YAML error when trying to add duplicated section here: repeated section/fill name: ... at YAML section/fill number ... | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 76 | Addressed with D75727, since we have more functions using it. | |
| 106 | Hm... OK. I see the llvm-dwarfdump prints the file format when outputs the --stats, but we can avoid that here. | |
| llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
| 287 | Agree, it seems reasonable. | |
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #243838) | There's a special syntax for creating unique sections with the same name. See case 1 in https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/yaml2obj/ELF/duplicate-section-names.yaml. | 
| 16 ↗ | (On Diff #248118) | Actually, on second thoughts "Total Size" (without the colon) is probably better. I think it reads cleaner. I'd also space it the same as the section columns above, so that it all lines up. | 
| 6 ↗ | (On Diff #248658) | Tip: Add spaces between the '#' and 'CHECK' so that the colons line up: #      CHECK:file: {{.*}}
# CHECK-NEXT:---------------------------------------------------- | 
| 10 ↗ | (On Diff #248658) | I think the section name column needs to be dynamically sized. The length of a section name isn't really limited, and indeed, there are some quite long DWARF section names out there already (e.g. .debug_cu_index or .debug_info.dwo), which might cause the formatting here to go to pot. Please add a section with a long name (e.g. change .debug_foo to .debug_arbitrary_long_name) to illustrate. | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 90 | const Twine & | |
| 106 | If a use-case arises for it later, we can add it then. | |
Sorry for taking so long, I have three more questions:
What does the output look like for fat Mach-O binaries? There should be some in the dsymutil test inputs, if you have trouble creating one.
What does the output look like for a static .a archive?
I'm wondering if we should include the *data* collected here in the JSON output of --statistics if we don't already.
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 80 | nit: Since these are one liners. Can remove these braces from both if-else block? | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 35 | Why 13 here? I'd think it would only need to have a minimum of the size of "SECTION" + 2 or so. | |
| 49 | The dashes under "SECTION" should be increased in number based on the column width. | |
| 98 | collectObjectSectionSizes is new code, so it should use the right signature from day 1 rather than waiting until a later change. In fact, D75727 can probably be committed first. | |
| llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
| 421 | I just want to be clear: is this how clang-format formats this comment? | |
| 649–652 | A thought: is there any reason we can't allow --statistics and --show-section-sizes in the same run? | |
I have added the tests for this.
I'm wondering if we should include the *data* collected here in the JSON output of --statistics if we don't already.
We do not include, but I can split the methods from SectionSizes.cpp into a header file, and allow the Statistics part to using this as well. I think this fields are useful for that. WDYT?
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 35 | Sure. The magic value was used within some other LLVM tool when printing the section name, that is why I put it like this. | |
| 80 | Sure. | |
| llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
| 421 | Yes. | |
| 649–652 | The option should act as "pretty printing", I think we should not allow mixing the output with --statistics JSON output. But, we can include these numbers within --statistics output as well. | |
We do not include, but I can split the methods from SectionSizes.cpp into a header file, and allow the Statistics part to using this as well. I think this fields are useful for that. WDYT?
Yes, I think the size data makes a lot of sense for --statistics.
I think we're almost there!
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 15 | compute this as a constexpr? | |
| 56 | Shouldn't this be in libObject instead? | |
| llvm/tools/llvm-dwarfdump/SectionSizes.h | ||
| 2 | -*- C++ -*- this is a header file. | |
| 23 | Is the StringRef the section name? These are so small that we could use an llvm::StringMap instead. | |
| llvm/test/tools/llvm-dwarfdump/X86/archive_section_sizes.test | ||
|---|---|---|
| 47 ↗ | (On Diff #249587) | It would be cute to also print the cumulative summary of all files in the archive, but this is not something that needs to be done to get this patch landed. | 
| llvm/test/tools/llvm-dwarfdump/X86/archive_section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #249587) | As this test is specifically for mach fat archives, perhaps worth renaming the test "fat_archive_section_sizes.test" or possibly "macho_archive_section_sizes.test". | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 15 | length -> width Your comment suggests that the size should include the "+ 2", and the initial MaxWidth does so, but then you don't include it in the loop, and the rest of the code adds the 2 on, so I think there's an inconsistency here. | |
| 26 | I'd go with "MaxNameWidth". | |
| 30 | I would suggest that the start of "SECTION" lines up with the start of the section names, to match the sizes column. | |
| 57–61 | I think the naming here needs to be format specific unfortunately, as "__debug" is not a debug section for ELF. Perhaps not worth worrying about though...? | |
| llvm/tools/llvm-dwarfdump/SectionSizes.h | ||
| 36 | the found debug sections | |
| llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
| 649–652 | Okay, sounds reasonable. Since one of these ifs already uses braces, I recommend all within this bit do for consistency. | |
I tested this output on bigger files, i.e. gdb:
llvm-dwarfdump --show-section-sizes gdb ---------------------------------------------------- file: gdb ---------------------------------------------------- SECTION SIZE -------------- -------- .debug_info 74863947 (63.71%) .debug_loc 15324074 (13.04%) .debug_ranges 2709232 (2.31%) .debug_aranges 66432 (0.06%) .debug_abbrev 2103393 (1.79%) .debug_line 4517257 (3.84%) .debug_str 8194356 (6.97%) Total Size: 107778691 (91.72%) Total File Size: 117510032 ----------------------------------------------------
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 15 | I have change the function a bit, so I think we do not need the "constexpr" anymore. | |
| 15 | 
 Sorry, I am not sure I follow :) Maybe my comment is not clear enough? The "+2" refers only for the minimum width. | |
| 56 | Since the ELF does not consider the '__debug*' as debug sections, I would use this locally. | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 57–61 | I agree, that is why I would keep this function local to llvm-dwarfdump. | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 56 | Wait, so now I'm confused :-) __debug* is the Mach-O section name for the ELF .debug* sections (because of different naming rules and constraints). I don't think you would ever have a __debug* section in a n ELF object. So this should be implementable as a virtual method in ObjectFile, right? | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
|---|---|---|
| 56 | Oh right, we do not need to implement it within ElfObjectFile, that makes sense :) | |
| llvm/include/llvm/Object/ObjectFile.h | ||
|---|---|---|
| 126 ↗ | (On Diff #250140) | is a debug | 
| 278 ↗ | (On Diff #250140) | Perhaps this interface extension can be pulled into a separate review. You could then modify llvm-objcopy in that same patch to make use of it. | 
| llvm/lib/Object/ObjectFile.cpp | ||
| 94 ↗ | (On Diff #250140) | I think this function needs to be implemented separately for ELF/Mach-O etc. For ELF objects, we don't want __debug counting as a debug section, whilst for Mach-O we don't want .debug. And for COFF etc, we don't want either probably! It should be easy to make this a virtual function overridden by the object file types. | 
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
| 7 ↗ | (On Diff #250140) | I'd expect the dashes to go to the closing parenthesis for the percentage. | 
| 15–16 ↗ | (On Diff #250140) | For consistency, it probably makes sense to have these formatted the same as the section names, and have the Section column width take account of this too. I don't feel strongly about this though, so if you prefer it this way, that's fine. | 
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes.test | ||
| 1 ↗ | (On Diff #250140) | In another test file possibly, you should test the behaviour where there are no debug sections. Maybe also where all sections are empty? Not sure if that's really important though. | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 17 | I don't think you need to pass in SectionNameColPrintName for this purpose. Probably simpler would be to factor out a simple local string constant const StringRef = "Section". I still don't understand the +2. If all the section names are shorter than "Section", then the column will be two bytes wider than necessary. However, this +2 is not included if one or more of the section names is longer than "Section" + 2. If you want to make the column 2 wider than the names, that's fine (but why?), but you should add the +2 before returning, rather than in the initial value. | |
| 28 | Let's make this consistent with my suggestions above, and use the column title as the minimum width. | |
| 31 | You need to allow for the percentage part of the size too, I think. | |
| 40 | Typo in variable name. | |
| 113–114 | Clarify what "it's" is here (i.e. use "If the input file is an archive") Also, I'd move the TODO below the function call to print. | |
| llvm/tools/llvm-dwarfdump/SectionSizes.h | ||
| 18–19 | I don't think it's good form to put using namespace in a header file. Wrap the whole thing inside namespace llvm etc. | |
| 21 | No need for the llvm:: | |
| 35 | Maybe worth calling this getNameColumnWidth and updating the comment accordingly. | |
| 39 | Similar coment to the above. How about simply getSizeColumnWidth? | |
@jhenderson Thanks for the review again!
| llvm/include/llvm/Object/ObjectFile.h | ||
|---|---|---|
| 278 ↗ | (On Diff #250140) | Done with the D76276. But I have not refactor the llvm-objcopy part, yet. That could be a separate patch. | 
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 15–16 ↗ | (On Diff #250140) | I would keep as is, and we can modify it if starts looking ugly. :) | 
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes.test | ||
| 1 ↗ | (On Diff #250140) | Sure, that makes sense! | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 17 | 
 OK, I do not have a strong opinion on this, and I will agree on that. Thanks. | |
| 31 | Please take a look into the output now, if it still looks it should be included, I will include the dashes for the percentage part as well. | |
| llvm/tools/llvm-dwarfdump/SectionSizes.h | ||
| 39 | It looks better, thanks! | |
I think this is starting to look good now. I found two more opportunities for improving the output... sorry about finding these piecemeal.
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 6 ↗ | (On Diff #251016) | What's the unit of SIZE? Should we print that? | 
| 24 ↗ | (On Diff #251016) | Should we right-align all the numbers? 56 0 | 
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_no_debug_sections.test | ||
|---|---|---|
| 7 | Rather than having a special message here, I'd probably just print the Total Size and Total File Size lines as before (which will be 0 and the object file size). | |
| 17–22 | Remove the extra spaces between the keys and values. | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 16 | No need for the namespace here, since you are using namespace llvm above. | |
| 32 | Typo: SectionSizeTitle | |
| llvm/tools/llvm-dwarfdump/SectionSizes.h | ||
| 33–38 | It looks like these two functions are no longer defined after you renamed them in the source file? If they aren't used outside the source file, just make them static functions in the .cpp. Same goes for everything in here - only include in the header what needs to be part of the public interface. | |
Forgot to mention: given this is intended to be common across other file formats, and there is a need to test the isDebugSection functions, I'd recommend adding tests for COFF and/or Mach-O as well as the ELF one. They don't need to be comprehensive necessarily.
@aprantl @jhenderson Thanks a lot for the reviews! Now we have tests for ELF(section_sizes.test), COFF(coff-x86_64.yaml) and MACH-O(macho_archive_section_sizes.test).
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 6 ↗ | (On Diff #251016) | bytes, and we can hard code them now, but when we add support for different units, that could be dynamically printed. | 
| 24 ↗ | (On Diff #251016) | Good idea, thanks! | 
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_no_debug_sections.test | ||
| 7 | Sounds good, thanks! | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 32 | Oh, I uploaded a wrong diff :( | |
Looks like a great start! (From my side)
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 9 ↗ | (On Diff #251575) | long-term for the human readable output we probably want a function that takes byte value and prints it the way ls -l -h formats number, if we don't already have one in LLVM. | 
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #251575) | You should avoid using a precanned binary if possible, especially one from a completely different tool's testing. yaml2obj has Mach-O support, so I'd recommend using that (although I don't know how to create fat archives). The new file should have section names in the ELF style too, to show that they are not included. | 
| 6 ↗ | (On Diff #251575) | I might suggest (bytes) explicitly, for clarity, but I don't mind, if there's some pre-existing style this is following elsewhere. | 
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes.test | ||
| 46 ↗ | (On Diff #251575) | You might want to add a section with a Mach-O naming style to show that it is NOT included. | 
| llvm/test/tools/llvm-dwarfdump/coff-x86_64.yaml | ||
| 2 ↗ | (On Diff #251575) | Rather than folding it in here, I'd keep this test in a separate test file, or even just add it to section_sizes.test (you might need to copy the YAML across too of course). Same comment then goes as the other tests re. section names. | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 30 | To avoid rotting once support for other units is added, it might make more sense to say "the size of the column title". | |
| 81–82 | You still don't need the namespace llvm since you have a using namespace llvm at the top of this file. | |
Thanks!
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #251575) | 
 There are cases, even in llvm-dwarfdump, using the files from Input/, but if you think so, I can delete it. 
 I've added a new test case for this. | 
| 6 ↗ | (On Diff #251575) | I see it is being used as is. Also, I put an explanation in the 'help' for the option. | 
| 9 ↗ | (On Diff #251575) | I agree. | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 81–82 | This is needed to avoid an ambiguous call to the calculateSectionSizes(). | |
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #251575) | Using assembly etc from the Inputs folder local to the current tool is fine, but where possible, we try to avoid pre-canned binaries these days, as they are opaque and hard to maintain. Referencing anotehr tool's Inputs folder is particularly bad as it creates an unexpected dependency between the two sets of tests. | 
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_coff.test | ||
| 2 | Add a comment like the other test. | |
| 27 | If I'm not mistaken, these don't need to be valid data. Does COFF yaml2obj provide the "Size:" tag as an option for setting the section size? If not, I'd just use zeroes to write the data. Same goes below. | |
| 35 | "This is the debug section following the Mach-O naming style, so this is not included in the report." -> "This is a debug section following the Mach-O naming style, and is used to show that it such sections are not included in the report." | |
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_elf.test | ||
| 2 | Add "for ELF objects" to the end here. | |
| 50 | Same comment as above. | |
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_mach_o.test | ||
| 1 ↗ | (On Diff #252041) | Rename this test to section_sizes_macho.test for consistency with typical naming styles elsewhere. Please add a top-level comment too. Any particular reason you're creating an intermediate object here and not in the COFF case? | 
| 37 ↗ | (On Diff #252041) | Same basic comment as in other tests. | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 30 | Sorry, I guess this wasn't quite clear enough. I meant to replace the "size of "SIZE (b)"" with "size of the column title" here. So the full sentence is "The minimum column width should be the size of the column title." | |
| 81–82 | What ambiguous call? There is only one function called calculateSectionSizes() defined here... | |
| llvm/test/tools/llvm-dwarfdump/X86/macho_archive_section_sizes.test | ||
|---|---|---|
| 1 ↗ | (On Diff #251575) | OK, I'll remove the test. | 
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 81–82 | Without even llvm:: here I get: SectionSizes.cpp:109:3: error: call to 'calculateSectionSizes' is ambiguous maybe I am oversighting something very obvious here? | |
Code all looks fine now, just a few minor test suggestions.
What happened to "macho_archive_section_sizes.test"? I thought that was asked for for testing fat archives? On that note, it might be worth a test for llvm-ar produced archives too.
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_coff.test | ||
|---|---|---|
| 27 | Perhaps worth having one of these with a non-zero size? | |
| 37 | Split this over two lines. Also, there was a typo in my original suggestion: "show that it such" -> "show that such" | |
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_elf.test | ||
| 50 | Ditto. | |
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_macho.test | ||
| 40 | Same comment as elsewhere. | |
| llvm/tools/llvm-dwarfdump/SectionSizes.cpp | ||
| 81–82 | Never mind, I'm just being a bit dumb, I think. | |
What happened to "macho_archive_section_sizes.test"? I thought that was asked for for testing fat archives? On that note, it might be worth a test for llvm-ar produced archives too.
I'll add one. Thanks.
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_coff.test | ||
|---|---|---|
| 27 | Makes sense to me. | |
| 35 | Oh, my bad, I should have checked it. | |
Hi @aprantl,
Do you think a regular llvm-ar archive test is sufficient for this testing, or do you think a mach-o fat archive is important to test independently? I don't see a massive need for the latter personally.
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_archive.test | ||
|---|---|---|
| 14 | Please include the actual file path here and below. You may need to use FileCheck's -D to achieve this, e.g. FileCheck %s -DFILE1=%t.1.o -DFILE2=%t.2.o --match-full-lines --strict-whitespace. | |
| 24–25 | You should probably make this a continuous series of CHECK-NEXT/CHECK-EMPTY, to show what the formatting between blocks looks like. | |
| 29–30 | Change one of these two sections to use the same name as one from the first object, to show that the sizes are independent. | |
I think that @aprantl wanted a test for Mach-O and archives, back then. I've added a test for mach-o and archive, in one shot (by using pre-existing fat mach-o archive from dsymutil). Then, I've added different tests for individual targets (elf, mach-o and coff) in order to confirm we use the right isDebugSection() and by not including a debug section from different name styling (we weren't able to check that with existing mach-o test, since we had the pre-existing fat mach-o archive from dsymutil test). Furthermore, we decided not to use pre-existing Inputs/ from different tools, so I've added the llvm/test/tools/llvm-dwarfdump/X86/section_sizes_archive.test, by testing archives. I don't think we would test something new by adding another fat archive test.
What makes the dsymutil test special is that you can have fat (=multi-architecture) Mach-O binaries inside a static archive and it is important to test that this case is handled correctly, because you have two layers of multi-file enumeration going on. Since we already have it checked in as a binary I thought reusing it would be preferable over checking in yet another opaque binary. If we can generate one one the fly, that's fine, too.
I'm not at all familiar with Mach-O, so how do fat archives get generated (i.e. with what tool?)? Is there an LLVM equivalent we can use here?
There is no such thing as a fat archive, just regular archives that contain fat object files. The tool to create fat binaries is called lipo and there seems to be an llvm variant of it.
Thanks! I'm sure I heard them referred to fat archives before...! @djtodoro, could you have a look at using llvm-lipo to create a simple test case for fat binaries in archives?
LGTM, with one last request.
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_coff.test | ||
|---|---|---|
| 8 | Some of the tests are missing this first line. Please fix them to all check it. | |
| llvm/test/tools/llvm-dwarfdump/X86/section_sizes_archive.test | ||
|---|---|---|
| 15 | CHECK-NEXT here and in all tests you just updated. | |
Options are documented in alphabetical order, so this should be above --statistics.