DWARF5 has support for DW_FORM_addrx, which can be useful for space savings, but it needs a .debug_addr section to be used. dsymutil does not have the ability to emit a debug_addr section currently. This patch adds support for that
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Could you compare size of .debug_info, .debug_addr sections for clang binary with and without this patch, please?
Of course!
The size of a clang.dSYM debug_info section built with -gdwarf-5 without this patch is 930,375,162 Bytes or 930 MB
The size of a clang.dSYM debug_info + debug_addr section built with -gdwarf-5 with this patch is 914,086,636 Bytes or 914 MB
I would also like to note, that at the moment I am not optimizing for the addrx values, i.e. no work is done to sort the addrx values by frequency to optimize the size of the addrx values in the debug info section based on frequency, and we still get a modest 1.8% decrease in size
Thank you! yep, 1.8% is a good result.
How about changing DW_RLE_start_length/DW_LLE_start_length into DW_RLE_startx_length/DW_LLE_startx_length for rnglists/loclists? Or this is for another patch?
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
100 | add a Doxygen comment, please. | |
642 | llvm coding standard suggests to use SmallVector instead std::vector. https://llvm.org/docs/CodingStandards.html#c-standard-library | |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
1393 | probably it would make sense to hide this code under method of DebugAddrPool? uint64_t DebugAddrPool::getAddrIndex ( uint64_t Addr ) { DenseMap<uint64_t, uint64_t>::iterator It = AddrPool.AddrIndexMap.find(Addr); if (It == AddrPool.AddrIndexMap.end()) { AddrPool.Addrs.push_back(*Addr); It = AddrPool.AddrIndexMap.insert(std::make_pair(Addr, AddrPool.Addrs.size()).first; } return It.second; } | |
1400 | assuming DebugAddrPool::getAddrIndex exist: DIEInteger(AddrAttribute->getForm() == dwarf::DW_FORM_addr ? *Addr : DebugAddrPool::getAddrIndex(*Addr)); | |
2579 | probably, the dwarf::DW_AT_addr_base should be updated here with the proper offset to .debug_addr section. | |
2725 | It looks like AddrIndexMap&Addrs may be local for compile unit. i.e., not necessary to make them global. | |
llvm/lib/DWARFLinker/DWARFStreamer.cpp | ||
278 | it seems AddrSectionSize should be updated here. |
How about changing DW_RLE_start_length/DW_LLE_start_length into DW_RLE_startx_length/DW_LLE_startx_length for rnglists/loclists? Or this is for another patch?
This patch just adds the functionality to emit the debug_addr section, I will work on future patches to use the indirections in the other sections
llvm::ArrayRef<uint64_t> is more common than const std::vector<uint64_t> & in the codebase.
Addressed review comments:
- Added a Doxygen comment for void emitAddrs
- Replaced std::vector usage with SmallVector
- Created function DebugAddrPool:: getAddrIndex to keep track of Addresses in a Compile Unit
- Removed global AddrIndexMap and Addrs
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
642 | I was just being consistent because DIECloner::DIECloner() takes a std::vector<std::unique_ptr<CompileUnit>> &CompileUnits, maybe that should be changed to? | |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
2579 | Correct me if I am wrong, but there doesn't seem to be a way to modify the value of an attribute in the die after it has already been set right? |
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
138 | nit: move this definition close to emitDwarfDebugAddrFooter so that methods emitting DebugAddr were defined together, please. | |
642 | yes, I think it(replacing std::vector with SmallVector for CompileUnits) would be good. But with separate cleanup patch. | |
832 | add Doxigen comment please. | |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
1396 | probably, to not check for AttrSpec.Form == dwarf::DW_FORM_addr twice we can use early return pattern? if (AttrSpec.Form == dwarf::DW_FORM_addr) { Die.addValue(DIEAlloc, AttrSpec.Attr, AttrSpec.Form, DIEInteger(*Addr)); return Unit.getOrigUnit().getAddressByteSize(); } Die.addValue(DIEAlloc, AttrSpec.Attr, AttrSpec.Form, DIEInteger(AddrPool.getAddrIndex(*Addr))); return AttrSize; Another thing, as we generate a new .debug_addr section, its size may be changed and original strictly sized forms (f.e. DW_FORM_addrx1) may become insufficient. On the other side, as we usually only remove addresses everything should be fine. It is probably worth adding an assertion here checking that AddrPool.getAddrIndex(*Addr) fits into the original form. f.e. assert ((AttrSpec.Form == dwarf::DW_FORM_addrx1 && AddrPool.getAddrIndex(*Addr) <= UINT8_MAX) || other cases | |
1659 | as it matches with default we can just remove this case. | |
2571 | No need to do clear() as AddrPool is local for DIECloner. | |
2579 | The cloned value of DW_AT_addr_base may be incorrect as we generate new .debug_addr, so it should be updated. It may be changed inside DWARFLinker::emitDebugAddrSection() the same way as DW_AT_stmt_list is updated inside DIECloner::generateLineTableForUnit(): static void patchStmtList(DIE &Die, DIEInteger Offset) { for (auto &V : Die.values()) if (V.getAttribute() == dwarf::DW_AT_stmt_list) { V = DIEValue(V.getAttribute(), V.getForm(), Offset); return; } llvm_unreachable("Didn't find DW_AT_stmt_list in cloned DIE!"); } |
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
101 | nit: move this definition close to emitDwarfDebugAddrFooter so that methods emitting DebugAddr were defined together, please. probably name them using the same pattern: emitDwarfDebugAddrsHeader |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1996 | one more nit: probably it would be better to make DWARFLinker::emitDebugAddrSection to be method of DIECloner as everything seems to be DIECloner local. also, moving check for DWARF version inside emitDebugAddrSection will make DIECloner::cloneAllCompileUnits look simpler: if (DwarfVersion < 5) return; |
Patched the addr_base value, moved functions for emitting Addr section header, footer, and the section to be together, and used the same naming scheme as the other sections, and resolved nitpicks
yep, thank you for the update.
Would you mind adding test for DW_AT_addr_base, please ? i.e. test having f.e. 3 compile units and testing that each DW_AT_addr_base attribute points to the proper part of .debug_addr section. Also, it looks like all(or probably most of) existing dwarf5 tests need adding checks for DW_AT_addr_base.
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
149 | ||
643 | This field is not used. we can safely remove it. | |
647 | We do not need to use this method it can be removed. | |
707 | No need to pass AddrSecSize as a parameter. We already have it inside. | |
llvm/include/llvm/DWARFLinker/DWARFStreamer.h | ||
113 | ||
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
1402 | it looks like this constant is not correct: UINT16_MAX + UINT8_MAX The proper one would be: (UINT32_MAX >> 8) | |
2027 | Original version of patch had a check for "-u" option. I think we still need it here. Also no need to remember and store AddrSecSize: if (LLVM_UNLIKELY(Options.Update)) return; if (DwarfVersion < 5) return; MCSymbol *EndLabel = Emitter->emitDwarfDebugAddrsHeader(Unit); patchAddrBase(*Unit.getOutputUnitDIE(), DIEInteger(Emitter->getDebugAddrSectionSize())); Emitter->emitDwarfDebugAddrs(Addrs, Unit.getOrigUnit().getAddressByteSize()); Emitter->emitDwarfDebugAddrsFooter(Unit, EndLabel); | |
2548 | AddrSectionSize could be removed. | |
2596 | No need to clear AddrPool as it is local to DIECloner. | |
llvm/lib/DWARFLinker/DWARFStreamer.cpp | ||
612 | No need to check the version again. It is already checked in DIECloner::emitDebugAddrSection(). | |
641 | ||
655 | No need to check the version again. It is already checked in DIECloner::emitDebugAddrSection(). |
- Changed assert from UINT16_MAX + UINT8_MAX to UINT32_MAX >> 8
- Fixed some doxygen comments
- Removed Index and clear from DebugAddrPool
- Removed AddrSecSize parameter
- Added check for -u option in emitDebugAddrSection
- Removed AddrPool.clear() from cloneAllCompileUnits
- Removed dwarf version checks where unnecessary
- Added a test for DW_AT_addr_base and updated dwarf5 tests to check for that attribute as well wherever appropriate
Thanks for changes,
there are failures during pre-integration testing. They need to be fixed before integration of this patch. I think the problem could probably be fixed by adding check for empty addresses into the DIECloner::emitDebugAddrSection():
if (Addrs.empty()) return;
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
707 | No need to pass Addrs as AddrPool is a member of DIECloner. | |
llvm/test/tools/dsymutil/ARM/dwarf5-addr_base.test | ||
2 | ||
50 | add a check for "-u" case, please. |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1393–1405 | Let's extract AddrPool.getAddrIndex(*Addr) and assign it to a variable that you can reuse in the assert. Also, is the assert actually checking internal consistency or catching invalid DWARF? If it's the latter we should emit an error/warning instead. | |
2011 | Similar question as for the assert: is this checking internal consistency (i.e. that we have created a DW_AT_addr_base in the cloned DIE) or can this trip for invalid input? |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1393–1405 | The purpose for the assert is to check internal consistency - newly generated addresses set should fit within the used form. We use original form for the newly generated addresses set as we assume that we do not add addresses and then original form should be enough. The assert proves this assumption. | |
2011 | This is also should be check for internal consistency: We should not miss DW_AT_addr_base when we generate .debug_addr. The patchAddrBase is called from DIECloner::emitDebugAddrSection, thus we check for DW_AT_addr_base exactly when we are going to generate .debug_addr. The only problem with current implementation is this check will fail when original DWARF does not have .debug_addr but we call emitDebugAddrSection for generation of empty .debug_addr. The check for empty addresses set needs to be added into emitDebugAddrSection. For the case of invalid DWARF(DW_AT_addr_base missed in input file) - we should not generate any addresses as we could not correctly read original addresses. DWARFLinker should report warning here and do not generate any address: DIECloner::cloneAddressAttribute std::optional<uint64_t> Addr = AddrAttribute->getAsAddress(); if (!Addr) { Linker.reportWarning("Cann't read address attribute value.", ObjFile); Addr = 0; } Though, this place looks a bit incorrect. Instead of using zero Addr, the attribute should probably be skipped. |
llvm/test/tools/dsymutil/ARM/dwarf5-addr_base.test | ||
---|---|---|
50 | I am not sure what should happen in a -u case Should the DW_AT_addr_base be patched in a -u case? It doesn't get patched now |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
2011 | DW_AT_addr_base is of form DW_FORM_sec_offset, so it will call DWARFLinker::DIECloner::cloneScalarAttribute(, are you suggesting that I add a warning there if there is no addr_base found? |
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
704–705 | ||
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
2011 | No, I do not suggest to add warning into DWARFLinker::DIECloner::cloneScalarAttribute which would report if DW_AT_addr_base is invalid. My suggestion is to - not create address attributes if input DW_AT_addr_base is invalid. If DW_AT_addr_base is invalid then it is not possible to read input addresses correctly. Instead of using zero or dummy value for the address attribute, it would be better to skip it .For that purpose we have AddrAttribute->getAsAddress() to be optional. If the DW_AT_addr_base is invalid it should return std::nullopt. So we can skip address attribute and report a warning inside DIECloner::cloneAddressAttribute(). Another thing is that it would be better to not report_fatal_error here. This is the place where an output DWARF is emitted. It looks too late to validate and report errors for input DWARF. At this point we need to have llvm_unreachable() indicating that generated DWARF is inconsistent. In short:
std::optional<uint64_t> Addr = AddrAttribute->getAsAddress(); if (!Addr) { Linker.reportWarning("Cann't read address attribute value.", ObjFile); return 0; }
| |
llvm/test/tools/dsymutil/ARM/dwarf5-addr_base.test | ||
50 | For -u case we need to check that generated DWARF is exactly the same as input DWARF. That DW_AT_addr_base is presented with correct value, that indexed addresses are used, that .debug_addr contains proper values. |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
2011 |
one thing to clarify - if all address attributes are skipped because DW_AT_addr_base is missed then we do not need to generate .debug_addr and then it is not a problem that DW_AT_addr_base is missed. |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1398 | This is the first time I'm reading through the patch, so I might be missing something: Since the address pool is growing as we are processing multiple object files, shouldn't we dynamically decide on the FORM based on the value of AddrIndex? Or is this happening elsewhere already? | |
llvm/lib/DWARFLinker/DWARFStreamer.cpp | ||
619 | Nit: Emit | |
624 | Emit version. |
Fixed some comments
instead of emitting the addrx form present in the compile unit, rewrite the addrx form to use addrx1, addrx2, addrx3, addrx4, or addrx based on number of addresses
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
653 | Doing things that way we create many unnecessary abbreviations. f.e. for index 255 there would be used this abbreviation: DW_TAG_subprogram DW_AT_low_pc [DW_FORM_addrx1] For index 256 there would be created new abbreviation: DW_TAG_subprogram DW_AT_low_pc [DW_FORM_addrx2] I cann't tell whether that solution takes less space. It is necessary to measure impact of many abbreviations to the DWARF size. From one side DIE records having DW_FORM_addrx1 would take less bytes. From another side many abbreviation codes will be increased and then DIEs would take more space. The solution from the previous version of this patch looks simpler and is OK. It is safe to just copy source form. DWARFLinker does not add new addresses for compile unit, so original form should be enough to keep address index. Just copy original form and check by assertion that everything is correct. |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1398 | the generated .debug_addr is local for compile unit. The number of addresses usually should not be increased for single compile unit. So we can use original form which was enough to keep source addresses. |
llvm/include/llvm/DWARFLinker/DWARFLinker.h | ||
---|---|---|
653 | Or, as another alternative, we can always replace sized forms with DW_FORM_addrx. DW_FORM_addrx would always be enough. In many cases it would be shorter then DW_FORM_addrx3 or DW_FORM_addrx4. This is simple solution which does not increase number of abbreviations. |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1398 | Ah I see. So, sorting the addresses by frequency and merging them into a global address pool would be a future optimization. |
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1398 | yep. I assume that the idea of that optimization is to save space. So that most used addresses has smaller index and it would be possible to use smaller fixed sized form. It would be interesting to measure how that optimization affects abbreviations. It should increase numbers of abbreviations and size of some DIEs(since every DIE has abbreviation number in the beggining) |
Looks like @avl is correct:
Using just addrx, the size of the debug_info + the debug_addr + debug_abbrev sections is 903,287,425 Bytes or ~903 MB
Using addrx1, addrx2, addrx3, addrx4 the size of the debug_info + debug_addr + debug_abbrev sections is 905,953,864 MB or ~906 MB
I will make it so that dsymutil only uses addrx
Thank you for the update! I think it is mostly done. Please, check two small things in inline comments.
llvm/lib/DWARFLinker/DWARFLinker.cpp | ||
---|---|---|
1398 | It would be incorrect to return size of original attribute here. unsigned NewAttrSize = Die.addValue(DIEAlloc, static_cast<dwarf::Attribute>(AttrSpec.Attr), dwarf::Form::DW_FORM_addrx, DIEInteger(AddrIndex))->sizeOf(OrigUnit.getFormParams()); return NewAttrSize; | |
llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-addresses.test | ||
44 | should not it be DW_FORM_addrx here and in following cases? |
We have to clear the AddrPool after one compile unit is processed. If we consider the following test case with the attached files:
when using the following command:
dwarfdump -a -v test.o| less
The output looks like:
.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004d, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000051)
0x0000000c: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_string] ("by_hand") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_string] ("GoodCU") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000001130) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_addr_base [DW_FORM_sec_offset] (0x00000008)
0x0000002b: DW_TAG_subprogram [2] * (0x0000000c)
DW_AT_name [DW_FORM_string] ("foo") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000001130) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_type [DW_FORM_ref4] (cu + 0x003e => {0x0000003e} "int")
0x0000003d: NULL
0x0000003e: DW_TAG_base_type [3] (0x0000000c)
DW_AT_name [DW_FORM_string] ("int")
0x00000043: DW_TAG_variable [4] (0x0000000c)
DW_AT_name [DW_FORM_string] ("var1") DW_AT_type [DW_FORM_ref4] (cu + 0x003e => {0x0000003e} "int") DW_AT_location [DW_FORM_exprloc] (DW_OP_addrx 0x0)
0x00000050: NULL
0x00000051: Compile Unit: length = 0x0000004d, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0031, addr_size = 0x08 (next unit at 0x000000a2)
0x0000005d: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_string] ("by_hand") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_string] ("BadCU") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000000010) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_addr_base [DW_FORM_sec_offset] (0x00000018)
0x0000007b: DW_TAG_subprogram [2] * (0x0000005d)
DW_AT_name [DW_FORM_string] ("foo1") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000000010) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_type [DW_FORM_ref4] (cu + 0x003e => {0x0000008f} "int")
0x0000008e: NULL
0x0000008f: DW_TAG_base_type [3] (0x0000005d)
DW_AT_name [DW_FORM_string] ("int")
0x00000094: DW_TAG_variable [4] (0x0000005d)
DW_AT_name [DW_FORM_string] ("var1") DW_AT_type [DW_FORM_ref4] (cu + 0x003e => {0x0000008f} "int") DW_AT_location [DW_FORM_exprloc] (DW_OP_addrx 0x0)
0x000000a1: NULL
.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000001130
]
0x00000010: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000000010
Notice how the debug_addr section only has one address for each compile unit
If we use the following testcase:
llvm-project/build_ninja/bin/dsymutil -y test.map -o test.o.dSYM
dwarfdump -a -v test.o.dSYM | less
and dsymutil does not clear the AddrPool
.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004f)
0x0000000c: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_strp] ( .debug_str[0x00000001] = "by_hand") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000009] = "GoodCU") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000012260) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_addr_base [DW_FORM_sec_offset] (0x00000008)
0x00000024: DW_TAG_subprogram [2] (0x0000000c)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000010] = "foo") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000012260) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_type [DW_FORM_ref_addr] (0x0000000000000036 "int")
0x00000036: DW_TAG_base_type [3] (0x0000000c)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000014] = "int")
0x0000003b: DW_TAG_variable [4] (0x0000000c)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000018] = "var1") DW_AT_type [DW_FORM_ref_addr] (0x0000000000000036 "int") DW_AT_location [DW_FORM_exprloc] (DW_OP_addr 0x12260)
0x0000004e: NULL
0x0000004f: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000009e)
0x0000005b: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_strp] ( .debug_str[0x00000001] = "by_hand") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000001d] = "BadCU") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000001) address = 0x0000000000010010) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_addr_base [DW_FORM_sec_offset] (0x00000018)
0x00000073: DW_TAG_subprogram [2] (0x0000005b)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000023] = "foo1") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000001) address = 0x0000000000010010) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_type [DW_FORM_ref_addr] (0x0000000000000085 "int")
0x00000085: DW_TAG_base_type [3] (0x0000005b)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000014] = "int")
0x0000008a: DW_TAG_variable [4] (0x0000005b)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000018] = "var1") DW_AT_type [DW_FORM_ref_addr] (0x0000000000000085 "int") DW_AT_location [DW_FORM_exprloc] (DW_OP_addr 0x10010)
0x0000009d: NULL
.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
]
0x00000010: Address table header: length = 0x00000014, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
0x0000000000010010
]
Notice how the debug_addr section for the second compile unit, contains the address in the first compile unit as well.
But if we clear the AddrPool
llvm-project/build_ninja/bin/dsymutil -y test.map -o test.o.dSYM
dwarfdump -a -v test.o.dSYM | less
.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004f)
0x0000000c: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_strp] ( .debug_str[0x00000001] = "by_hand") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000009] = "GoodCU") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000012260) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_addr_base [DW_FORM_sec_offset] (0x00000008)
0x00000024: DW_TAG_subprogram [2] (0x0000000c)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000010] = "foo") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000012260) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_type [DW_FORM_ref_addr] (0x0000000000000036 "int")
0x00000036: DW_TAG_base_type [3] (0x0000000c)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000014] = "int")
0x0000003b: DW_TAG_variable [4] (0x0000000c)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000018] = "var1") DW_AT_type [DW_FORM_ref_addr] (0x0000000000000036 "int") DW_AT_location [DW_FORM_exprloc] (DW_OP_addr 0x12260)
0x0000004e: NULL
0x0000004f: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000009e)
0x0000005b: DW_TAG_compile_unit [1] *
DW_AT_producer [DW_FORM_strp] ( .debug_str[0x00000001] = "by_hand") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000001d] = "BadCU") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000010010) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_addr_base [DW_FORM_sec_offset] (0x00000018)
0x00000073: DW_TAG_subprogram [2] (0x0000005b)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000023] = "foo1") DW_AT_low_pc [DW_FORM_addrx] (indexed (00000000) address = 0x0000000000010010) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000010) DW_AT_type [DW_FORM_ref_addr] (0x0000000000000085 "int")
0x00000085: DW_TAG_base_type [3] (0x0000005b)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000014] = "int")
0x0000008a: DW_TAG_variable [4] (0x0000005b)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000018] = "var1") DW_AT_type [DW_FORM_ref_addr] (0x0000000000000085 "int") DW_AT_location [DW_FORM_exprloc] (DW_OP_addr 0x10010)
0x0000009d: NULL
.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
]
0x00000010: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000010010
]
Notice how the debug_addr section matches the one in the object file, and doesn't contain a copy of the address from the first compile unit
Right. We need to clean(), sorry. I missed the fact that DIECloner enumerates several compile units.
Right. We need to clean(), sorry. I missed the fact that DIECloner enumerates several compile units.
Does the rest of the patch look okay?
It looks like the comment for DWARFLinker.cpp::1398 is not addressed:
It would be incorrect to return size of original attribute here.
Instead we need to calculate size of DW_FORM_addrx:
unsigned NewAttrSize = Die.addValue(DIEAlloc, static_cast<dwarf::Attribute>(AttrSpec.Attr), dwarf::Form::DW_FORM_addrx, DIEInteger(AddrIndex))->sizeOf(OrigUnit.getFormParams()); return NewAttrSize;
and the comment for llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-addresses.test::44
should not it be DW_FORM_addrx here and in following cases?
LGTM. Thank you for spending time on it! Please, also fix a problem with LLVM.tools/dsymutil/ARM::dwarf5-addr_base.test shown while pre-integration testing.
@avl I think those pre-integration errors are happening because phabricator doesn't have the binaries
phabricator has these files: Inputs/DWARF5-addr_base/
1.o
2.o
3.o
but they are of zero size. it looks like patch was created incorrectly or these files are broken.
add a Doxygen comment, please.