This patch is the start of what will become a detailed analysis of a DWARF file. The patch starts off by adding the infrastructure to analyze a DWARF file and pretty print out the DWARF section sizes and report the sizes along with their percentage of the DWARF size and file size:
$ llvm-dwarfdump --analyze analyze-dwarf-section-sizes.test.tmp.o File: "analyze-dwarf-section-sizes.test.tmp.o" Size: 9765 (9.54K) DWARF section sizes: SIZE % DWARF % FILE SECTION NAME ------- ------- ------- ------------------------------------- 372 23.65% 3.81% __debug_info 207 13.16% 2.12% __debug_line 182 11.57% 1.86% __debug_str 171 10.87% 1.75% __apple_types 165 10.49% 1.69% __debug_abbrev 116 7.37% 1.19% __apple_names 96 6.10% 0.98% __debug_aranges 83 5.28% 0.85% __debug_pubnames 81 5.15% 0.83% __debug_pubtypes 64 4.07% 0.66% __apple_namespac 36 2.29% 0.37% __apple_objc ======= ======= ======= ===================================== 1.54K 100.00% 16.11% Total DWARF Size
There is also a --json option that will print out JSON:
$ llvm-dwarfdump --analyze analyze-dwarf-section-sizes.test.tmp.o --json { "path": "/Users/gclayton/Documents/src/llvm/analyze/llvm/build/test/tools/llvm-dwarfdump/Output/analyze-dwarf-section-sizes.test.tmp.o", "sections": [ { "name": "__debug_info", "size": 372 }, { "name": "__debug_line", "size": 207 }, { "name": "__debug_str", "size": 182 }, { "name": "__apple_types", "size": 171 }, { "name": "__debug_abbrev", "size": 165 }, { "name": "__apple_names", "size": 116 }, { "name": "__debug_aranges", "size": 96 }, { "name": "__debug_pubnames", "size": 83 }, { "name": "__debug_pubtypes", "size": 81 }, { "name": "__apple_namespac", "size": 64 }, { "name": "__apple_objc", "size": 36 } ], "size": 9765 }
I wanted to keep this patch small so this is all it does for now. Subsequent patches will add debug info size per source directory, type duplication data (how many of each type is duplicated and how much size in bytes are duplicated), and inline information (counts of all inlined functions and code size of the inlined code).
Seems like there could be potential for a floating point related error here where something that is an exact power of 1024 (e.g. 1.000K, 1.000M, 1.000G, 1.000T, etc) or very close to an exact multiple gets reported as being of the wrong unit (e.g. Terabytes instead of Gigabytes) due to the conditional.
What about something like:
I'm not sure if I got the math right. Also not sure if it's a problem in practice.