This is an archive of the discontinued LLVM Phabricator instance.

Emit a .debug_addr section with dsymutil
ClosedPublic

Authored by rastogishubham on Jul 6 2023, 11:18 AM.

Details

Summary

DWARF5 has support for DW_FORM_addrx, which can be useful for space savings, but it needs a .debug_addr section to be used. dsymutil does not have the ability to emit a debug_addr section currently. This patch adds support for that

Diff Detail

Event Timeline

rastogishubham created this revision.Jul 6 2023, 11:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 11:18 AM
rastogishubham requested review of this revision.Jul 6 2023, 11:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 11:18 AM
avl added a comment.Jul 6 2023, 2:00 PM

Could you compare size of .debug_info, .debug_addr sections for clang binary with and without this patch, please?

rastogishubham added a comment.EditedJul 6 2023, 2:48 PM

Could you compare size of .debug_info, .debug_addr sections for clang binary with and without this patch, please?

Of course!

The size of a clang.dSYM debug_info section built with -gdwarf-5 without this patch is 930,375,162 Bytes or 930 MB

The size of a clang.dSYM debug_info + debug_addr section built with -gdwarf-5 with this patch is 914,086,636 Bytes or 914 MB

I would also like to note, that at the moment I am not optimizing for the addrx values, i.e. no work is done to sort the addrx values by frequency to optimize the size of the addrx values in the debug info section based on frequency, and we still get a modest 1.8% decrease in size

avl added a comment.Jul 7 2023, 5:37 AM

Could you compare size of .debug_info, .debug_addr sections for clang binary with and without this patch, please?

Of course!

The size of a clang.dSYM debug_info section built with -gdwarf-5 without this patch is 930,375,162 Bytes or 930 MB

The size of a clang.dSYM debug_info + debug_addr section built with -gdwarf-5 with this patch is 914,086,636 Bytes or 914 MB

I would also like to note, that at the moment I am not optimizing for the addrx values, i.e. no work is done to sort the addrx values by frequency to optimize the size of the addrx values in the debug info section based on frequency, and we still get a modest 1.8% decrease in size

Thank you! yep, 1.8% is a good result.

How about changing DW_RLE_start_length/DW_LLE_start_length into DW_RLE_startx_length/DW_LLE_startx_length for rnglists/loclists? Or this is for another patch?

llvm/include/llvm/DWARFLinker/DWARFLinker.h
100

add a Doxygen comment, please.

642

llvm coding standard suggests to use SmallVector instead std::vector. https://llvm.org/docs/CodingStandards.html#c-standard-library

llvm/lib/DWARFLinker/DWARFLinker.cpp
1391

assuming DebugAddrPool::getAddrIndex exist:

DIEInteger(AddrAttribute->getForm() == dwarf::DW_FORM_addr
               ? *Addr 
               :  DebugAddrPool::getAddrIndex(*Addr));
1392

probably it would make sense to hide this code under method of DebugAddrPool?

uint64_t DebugAddrPool::getAddrIndex ( uint64_t Addr ) {
   DenseMap<uint64_t, uint64_t>::iterator It = AddrPool.AddrIndexMap.find(Addr);

    if (It == AddrPool.AddrIndexMap.end()) {
      AddrPool.Addrs.push_back(*Addr);
      It = AddrPool.AddrIndexMap.insert(std::make_pair(Addr, AddrPool.Addrs.size()).first;
    }

    return It.second;
}
2570

probably, the dwarf::DW_AT_addr_base should be updated here with the proper offset to .debug_addr section.

2716

It looks like AddrIndexMap&Addrs may be local for compile unit. i.e., not necessary to make them global.

llvm/lib/DWARFLinker/DWARFStreamer.cpp
278

it seems AddrSectionSize should be updated here.

@avl

How about changing DW_RLE_start_length/DW_LLE_start_length into DW_RLE_startx_length/DW_LLE_startx_length for rnglists/loclists? Or this is for another patch?

This patch just adds the functionality to emit the debug_addr section, I will work on future patches to use the indirections in the other sections

llvm::ArrayRef<uint64_t> is more common than const std::vector<uint64_t> & in the codebase.

rastogishubham marked 6 inline comments as done.

Addressed review comments:

  1. Added a Doxygen comment for void emitAddrs
  1. Replaced std::vector usage with SmallVector
  1. Created function DebugAddrPool:: getAddrIndex to keep track of Addresses in a Compile Unit
  1. Removed global AddrIndexMap and Addrs
llvm/include/llvm/DWARFLinker/DWARFLinker.h
642

I was just being consistent because DIECloner::DIECloner() takes a std::vector<std::unique_ptr<CompileUnit>> &CompileUnits, maybe that should be changed to?

llvm/lib/DWARFLinker/DWARFLinker.cpp
2570

Correct me if I am wrong, but there doesn't seem to be a way to modify the value of an attribute in the die after it has already been set right?

avl added inline comments.Jul 10 2023, 4:14 AM
llvm/include/llvm/DWARFLinker/DWARFLinker.h
142

nit: move this definition close to emitDwarfDebugAddrFooter so that methods emitting DebugAddr were defined together, please.

642

yes, I think it(replacing std::vector with SmallVector for CompileUnits) would be good. But with separate cleanup patch.

831

add Doxigen comment please.

llvm/lib/DWARFLinker/DWARFLinker.cpp
1396

probably, to not check for AttrSpec.Form == dwarf::DW_FORM_addr twice we can use early return pattern?

if (AttrSpec.Form == dwarf::DW_FORM_addr) {
  Die.addValue(DIEAlloc, AttrSpec.Attr, AttrSpec.Form, DIEInteger(*Addr));
  return Unit.getOrigUnit().getAddressByteSize();
}

Die.addValue(DIEAlloc, AttrSpec.Attr, AttrSpec.Form, DIEInteger(AddrPool.getAddrIndex(*Addr)));
return AttrSize;

Another thing, as we generate a new .debug_addr section, its size may be changed and original strictly sized forms (f.e. DW_FORM_addrx1) may become insufficient. On the other side, as we usually only remove addresses everything should be fine. It is probably worth adding an assertion here checking that AddrPool.getAddrIndex(*Addr) fits into the original form. f.e.

assert ((AttrSpec.Form == dwarf::DW_FORM_addrx1 && AddrPool.getAddrIndex(*Addr) <= UINT8_MAX)
            || other cases
1659

as it matches with default we can just remove this case.

2570

The cloned value of DW_AT_addr_base may be incorrect as we generate new .debug_addr, so it should be updated. It may be changed inside DWARFLinker::emitDebugAddrSection() the same way as DW_AT_stmt_list is updated inside DIECloner::generateLineTableForUnit():

static void patchStmtList(DIE &Die, DIEInteger Offset) {
  for (auto &V : Die.values())
    if (V.getAttribute() == dwarf::DW_AT_stmt_list) {
      V = DIEValue(V.getAttribute(), V.getForm(), Offset);
      return;
    }

  llvm_unreachable("Didn't find DW_AT_stmt_list in cloned DIE!");
}
2571

No need to do clear() as AddrPool is local for DIECloner.

avl added inline comments.Jul 10 2023, 4:14 AM
llvm/include/llvm/DWARFLinker/DWARFLinker.h
101

nit: move this definition close to emitDwarfDebugAddrFooter so that methods emitting DebugAddr were defined together, please.

probably name them using the same pattern:

emitDwarfDebugAddrsHeader
emitDwarfDebugAddrs
emitDwarfDebugAddrsFooter

avl added inline comments.Jul 10 2023, 5:44 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1996

one more nit: probably it would be better to make DWARFLinker::emitDebugAddrSection to be method of DIECloner as everything seems to be DIECloner local.

also, moving check for DWARF version inside emitDebugAddrSection will make DIECloner::cloneAllCompileUnits look simpler:

if (DwarfVersion < 5)
  return;
rastogishubham marked 8 inline comments as done.

Patched the addr_base value, moved functions for emitting Addr section header, footer, and the section to be together, and used the same naming scheme as the other sections, and resolved nitpicks

Added assert to make sure the index value fits within the give addrx form

avl added a comment.Jul 11 2023, 4:04 PM

yep, thank you for the update.

Would you mind adding test for DW_AT_addr_base, please ? i.e. test having f.e. 3 compile units and testing that each DW_AT_addr_base attribute points to the proper part of .debug_addr section. Also, it looks like all(or probably most of) existing dwarf5 tests need adding checks for DW_AT_addr_base.

llvm/include/llvm/DWARFLinker/DWARFLinker.h
156
643

This field is not used. we can safely remove it.

647

We do not need to use this method it can be removed.

710

No need to pass AddrSecSize as a parameter. We already have it inside.

llvm/include/llvm/DWARFLinker/DWARFStreamer.h
116
llvm/lib/DWARFLinker/DWARFLinker.cpp
1396

it looks like this constant is not correct:

UINT16_MAX + UINT8_MAX

The proper one would be:

(UINT32_MAX >> 8)

2017

Original version of patch had a check for "-u" option. I think we still need it here. Also no need to remember and store AddrSecSize:

if (LLVM_UNLIKELY(Options.Update))
  return;

if (DwarfVersion < 5)
  return;

MCSymbol *EndLabel = Emitter->emitDwarfDebugAddrsHeader(Unit);

patchAddrBase(*Unit.getOutputUnitDIE(), DIEInteger(Emitter->getDebugAddrSectionSize()));

Emitter->emitDwarfDebugAddrs(Addrs, Unit.getOrigUnit().getAddressByteSize());
Emitter->emitDwarfDebugAddrsFooter(Unit, EndLabel);
2524

AddrSectionSize could be removed.

2571

No need to clear AddrPool as it is local to DIECloner.

llvm/lib/DWARFLinker/DWARFStreamer.cpp
667

No need to check the version again. It is already checked in DIECloner::emitDebugAddrSection().

696
710

No need to check the version again. It is already checked in DIECloner::emitDebugAddrSection().

rastogishubham marked 13 inline comments as done.
  1. Changed assert from UINT16_MAX + UINT8_MAX to UINT32_MAX >> 8
  1. Fixed some doxygen comments
  1. Removed Index and clear from DebugAddrPool
  1. Removed AddrSecSize parameter
  1. Added check for -u option in emitDebugAddrSection
  1. Removed AddrPool.clear() from cloneAllCompileUnits
  1. Removed dwarf version checks where unnecessary
  1. Added a test for DW_AT_addr_base and updated dwarf5 tests to check for that attribute as well wherever appropriate
avl added a comment.EditedJul 13 2023, 4:13 AM

Thanks for changes,

there are failures during pre-integration testing. They need to be fixed before integration of this patch. I think the problem could probably be fixed by adding check for empty addresses into the DIECloner::emitDebugAddrSection():

if (Addrs.empty())
  return;
llvm/include/llvm/DWARFLinker/DWARFLinker.h
710

No need to pass Addrs as AddrPool is a member of DIECloner.

llvm/test/tools/dsymutil/ARM/dwarf5-addr_base.test
1 ↗(On Diff #539654)
49 ↗(On Diff #539654)

add a check for "-u" case, please.

JDevlieghere added inline comments.Jul 13 2023, 11:51 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1387–1396

Let's extract AddrPool.getAddrIndex(*Addr) and assign it to a variable that you can reuse in the assert. Also, is the assert actually checking internal consistency or catching invalid DWARF? If it's the latter we should emit an error/warning instead.

2001

Similar question as for the assert: is this checking internal consistency (i.e. that we have created a DW_AT_addr_base in the cloned DIE) or can this trip for invalid input?

avl added inline comments.Jul 14 2023, 3:24 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1387–1396

The purpose for the assert is to check internal consistency - newly generated addresses set should fit within the used form. We use original form for the newly generated addresses set as we assume that we do not add addresses and then original form should be enough. The assert proves this assumption.

2001

This is also should be check for internal consistency: We should not miss DW_AT_addr_base when we generate .debug_addr. The patchAddrBase is called from DIECloner::emitDebugAddrSection, thus we check for DW_AT_addr_base exactly when we are going to generate .debug_addr. The only problem with current implementation is this check will fail when original DWARF does not have .debug_addr but we call emitDebugAddrSection for generation of empty .debug_addr. The check for empty addresses set needs to be added into emitDebugAddrSection.

For the case of invalid DWARF(DW_AT_addr_base missed in input file) - we should not generate any addresses as we could not correctly read original addresses. DWARFLinker should report warning here and do not generate any address:

DIECloner::cloneAddressAttribute

std::optional<uint64_t> Addr = AddrAttribute->getAsAddress();
if (!Addr) {
  Linker.reportWarning("Cann't read address attribute value.", ObjFile);
  Addr = 0;
}

Though, this place looks a bit incorrect. Instead of using zero Addr, the attribute should probably be skipped.

rastogishubham marked 4 inline comments as done.

Addressed review comments

llvm/lib/DWARFLinker/DWARFLinker.cpp
1387–1396

It is checking internal consistency.

2001

This one can trip for invalid input, if the input dwarf doesn't have an addr base, then the cloned die would not either.

rastogishubham added inline comments.Jul 14 2023, 6:28 AM
llvm/test/tools/dsymutil/ARM/dwarf5-addr_base.test
49 ↗(On Diff #539654)

I am not sure what should happen in a -u case

Should the DW_AT_addr_base be patched in a -u case? It doesn't get patched now

rastogishubham added inline comments.Jul 14 2023, 6:38 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
2001

DW_AT_addr_base is of form DW_FORM_sec_offset, so it will call DWARFLinker::DIECloner::cloneScalarAttribute(, are you suggesting that I add a warning there if there is no addr_base found?

avl added inline comments.Jul 14 2023, 9:32 AM
llvm/include/llvm/DWARFLinker/DWARFLinker.h
707–708
llvm/lib/DWARFLinker/DWARFLinker.cpp
2001

No, I do not suggest to add warning into DWARFLinker::DIECloner::cloneScalarAttribute which would report if DW_AT_addr_base is invalid.

My suggestion is to - not create address attributes if input DW_AT_addr_base is invalid. If DW_AT_addr_base is invalid then it is not possible to read input addresses correctly. Instead of using zero or dummy value for the address attribute, it would be better to skip it .For that purpose we have AddrAttribute->getAsAddress() to be optional. If the DW_AT_addr_base is invalid it should return std::nullopt. So we can skip address attribute and report a warning inside DIECloner::cloneAddressAttribute().

Another thing is that it would be better to not report_fatal_error here. This is the place where an output DWARF is emitted. It looks too late to validate and report errors for input DWARF. At this point we need to have llvm_unreachable() indicating that generated DWARF is inconsistent.

In short:

  1. I think we need to change(with separate patch) the code in DIECloner::cloneAddressAttribute into this:
std::optional<uint64_t> Addr = AddrAttribute->getAsAddress();
if (!Addr) {
  Linker.reportWarning("Cann't read address attribute value.", ObjFile);
  return 0;
}
  • 2. use llvm_unreachable() here to indicate that required DW_AT_addr_base is not generated.
llvm/test/tools/dsymutil/ARM/dwarf5-addr_base.test
49 ↗(On Diff #539654)

For -u case we need to check that generated DWARF is exactly the same as input DWARF. That DW_AT_addr_base is presented with correct value, that indexed addresses are used, that .debug_addr contains proper values.

avl added inline comments.Jul 14 2023, 9:54 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
2001

My suggestion is to - not create address attributes if input DW_AT_addr_base is invalid.

one thing to clarify - if all address attributes are skipped because DW_AT_addr_base is missed then we do not need to generate .debug_addr and then it is not a problem that DW_AT_addr_base is missed.

aprantl added inline comments.Jul 18 2023, 3:56 PM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1390

This is the first time I'm reading through the patch, so I might be missing something: Since the address pool is growing as we are processing multiple object files, shouldn't we dynamically decide on the FORM based on the value of AddrIndex? Or is this happening elsewhere already?

llvm/lib/DWARFLinker/DWARFStreamer.cpp
674

Nit: Emit

679

Emit version.

rastogishubham marked 7 inline comments as done.

Fixed some comments

instead of emitting the addrx form present in the compile unit, rewrite the addrx form to use addrx1, addrx2, addrx3, addrx4, or addrx based on number of addresses

avl added inline comments.Jul 19 2023, 11:42 AM
llvm/include/llvm/DWARFLinker/DWARFLinker.h
653

Doing things that way we create many unnecessary abbreviations. f.e. for index 255 there would be used this abbreviation:

DW_TAG_subprogram 
  DW_AT_low_pc [DW_FORM_addrx1]

For index 256 there would be created new abbreviation:

DW_TAG_subprogram 
  DW_AT_low_pc [DW_FORM_addrx2]

I cann't tell whether that solution takes less space. It is necessary to measure impact of many abbreviations to the DWARF size. From one side DIE records having DW_FORM_addrx1 would take less bytes. From another side many abbreviation codes will be increased and then DIEs would take more space.

The solution from the previous version of this patch looks simpler and is OK. It is safe to just copy source form. DWARFLinker does not add new addresses for compile unit, so original form should be enough to keep address index. Just copy original form and check by assertion that everything is correct.

avl added inline comments.Jul 19 2023, 11:49 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1390

the generated .debug_addr is local for compile unit. The number of addresses usually should not be increased for single compile unit. So we can use original form which was enough to keep source addresses.

avl added inline comments.Jul 19 2023, 12:09 PM
llvm/include/llvm/DWARFLinker/DWARFLinker.h
653

Or, as another alternative, we can always replace sized forms with DW_FORM_addrx. DW_FORM_addrx would always be enough. In many cases it would be shorter then DW_FORM_addrx3 or DW_FORM_addrx4. This is simple solution which does not increase number of abbreviations.

aprantl added inline comments.Jul 19 2023, 12:34 PM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1390

Ah I see. So, sorting the addresses by frequency and merging them into a global address pool would be a future optimization.

avl added inline comments.Jul 19 2023, 1:42 PM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1390

yep. I assume that the idea of that optimization is to save space. So that most used addresses has smaller index and it would be possible to use smaller fixed sized form.

It would be interesting to measure how that optimization affects abbreviations. It should increase numbers of abbreviations and size of some DIEs(since every DIE has abbreviation number in the beggining)

Looks like @avl is correct:
Using just addrx, the size of the debug_info + the debug_addr + debug_abbrev sections is 903,287,425 Bytes or ~903 MB
Using addrx1, addrx2, addrx3, addrx4 the size of the debug_info + debug_addr + debug_abbrev sections is 905,953,864 MB or ~906 MB

I will make it so that dsymutil only uses addrx

Always emit the form DW_FORM_addrx in the debug_info section when using dsymutil

Added new line at the end of dwarf5-addr_base.test

avl added a comment.Jul 20 2023, 2:50 PM

Thank you for the update! I think it is mostly done. Please, check two small things in inline comments.

llvm/lib/DWARFLinker/DWARFLinker.cpp
1390

It would be incorrect to return size of original attribute here.
Instead we need to calculate size of DW_FORM_addrx:

unsigned NewAttrSize = Die.addValue(DIEAlloc, static_cast<dwarf::Attribute>(AttrSpec.Attr),
               dwarf::Form::DW_FORM_addrx, DIEInteger(AddrIndex))->sizeOf(OrigUnit.getFormParams());

return NewAttrSize;
llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-addresses.test
44

should not it be DW_FORM_addrx here and in following cases?

@avl

We have to clear the AddrPool after one compile unit is processed. If we consider the following test case with the attached files:

when using the following command:
dwarfdump -a -v test.o| less

The output looks like:

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004d, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000051)

0x0000000c: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_string]   ("by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_string]       ("GoodCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000001130)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)

0x0000002b: DW_TAG_subprogram [2] * (0x0000000c)

DW_AT_name [DW_FORM_string]     ("foo")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000001130)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000003e} "int")

0x0000003d: NULL

0x0000003e: DW_TAG_base_type [3] (0x0000000c)

DW_AT_name [DW_FORM_string]     ("int")

0x00000043: DW_TAG_variable [4] (0x0000000c)

DW_AT_name [DW_FORM_string]     ("var1")
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000003e} "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

0x00000050: NULL
0x00000051: Compile Unit: length = 0x0000004d, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0031, addr_size = 0x08 (next unit at 0x000000a2)

0x0000005d: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_string]   ("by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_string]       ("BadCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000000010)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000018)

0x0000007b: DW_TAG_subprogram [2] * (0x0000005d)

DW_AT_name [DW_FORM_string]     ("foo1")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000000010)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000008f} "int")

0x0000008e: NULL

0x0000008f: DW_TAG_base_type [3] (0x0000005d)

DW_AT_name [DW_FORM_string]     ("int")

0x00000094: DW_TAG_variable [4] (0x0000005d)

DW_AT_name [DW_FORM_string]     ("var1")
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000008f} "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

0x000000a1: NULL

.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000001130
]
0x00000010: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000000010

Notice how the debug_addr section only has one address for each compile unit

If we use the following testcase:
llvm-project/build_ninja/bin/dsymutil -y test.map -o test.o.dSYM
dwarfdump -a -v test.o.dSYM | less
and dsymutil does not clear the AddrPool

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004f)

0x0000000c: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000009] = "GoodCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)

0x00000024: DW_TAG_subprogram [2] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000010] = "foo")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")

0x00000036: DW_TAG_base_type [3] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000003b: DW_TAG_variable [4] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x12260)

0x0000004e: NULL
0x0000004f: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000009e)

0x0000005b: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000001d] = "BadCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000001) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000018)

0x00000073: DW_TAG_subprogram [2] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000023] = "foo1")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000001) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")

0x00000085: DW_TAG_base_type [3] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000008a: DW_TAG_variable [4] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x10010)

0x0000009d: NULL

.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
]
0x00000010: Address table header: length = 0x00000014, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
0x0000000000010010
]

Notice how the debug_addr section for the second compile unit, contains the address in the first compile unit as well.

But if we clear the AddrPool

llvm-project/build_ninja/bin/dsymutil -y test.map -o test.o.dSYM
dwarfdump -a -v test.o.dSYM | less

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004f)

0x0000000c: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000009] = "GoodCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)

0x00000024: DW_TAG_subprogram [2] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000010] = "foo")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")

0x00000036: DW_TAG_base_type [3] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000003b: DW_TAG_variable [4] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x12260)

0x0000004e: NULL
0x0000004f: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000009e)

0x0000005b: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000001d] = "BadCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000018)

0x00000073: DW_TAG_subprogram [2] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000023] = "foo1")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")

0x00000085: DW_TAG_base_type [3] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000008a: DW_TAG_variable [4] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x10010)

0x0000009d: NULL

.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
]
0x00000010: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000010010
]

Notice how the debug_addr section matches the one in the object file, and doesn't contain a copy of the address from the first compile unit

Clear AddrPool after processing a compile unit

avl added a comment.Jul 20 2023, 3:20 PM

@avl

We have to clear the AddrPool after one compile unit is processed. If we consider the following test case with the attached files:

when using the following command:
dwarfdump -a -v test.o| less

The output looks like:

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004d, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000051)

0x0000000c: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_string]   ("by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_string]       ("GoodCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000001130)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)

0x0000002b: DW_TAG_subprogram [2] * (0x0000000c)

DW_AT_name [DW_FORM_string]     ("foo")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000001130)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000003e} "int")

0x0000003d: NULL

0x0000003e: DW_TAG_base_type [3] (0x0000000c)

DW_AT_name [DW_FORM_string]     ("int")

0x00000043: DW_TAG_variable [4] (0x0000000c)

DW_AT_name [DW_FORM_string]     ("var1")
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000003e} "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

0x00000050: NULL
0x00000051: Compile Unit: length = 0x0000004d, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0031, addr_size = 0x08 (next unit at 0x000000a2)

0x0000005d: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_string]   ("by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_string]       ("BadCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000000010)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000018)

0x0000007b: DW_TAG_subprogram [2] * (0x0000005d)

DW_AT_name [DW_FORM_string]     ("foo1")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000000010)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000008f} "int")

0x0000008e: NULL

0x0000008f: DW_TAG_base_type [3] (0x0000005d)

DW_AT_name [DW_FORM_string]     ("int")

0x00000094: DW_TAG_variable [4] (0x0000005d)

DW_AT_name [DW_FORM_string]     ("var1")
DW_AT_type [DW_FORM_ref4]       (cu + 0x003e => {0x0000008f} "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

0x000000a1: NULL

.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000001130
]
0x00000010: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000000010

Notice how the debug_addr section only has one address for each compile unit

If we use the following testcase:
llvm-project/build_ninja/bin/dsymutil -y test.map -o test.o.dSYM
dwarfdump -a -v test.o.dSYM | less
and dsymutil does not clear the AddrPool

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004f)

0x0000000c: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000009] = "GoodCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)

0x00000024: DW_TAG_subprogram [2] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000010] = "foo")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")

0x00000036: DW_TAG_base_type [3] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000003b: DW_TAG_variable [4] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x12260)

0x0000004e: NULL
0x0000004f: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000009e)

0x0000005b: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000001d] = "BadCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000001) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000018)

0x00000073: DW_TAG_subprogram [2] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000023] = "foo1")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000001) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")

0x00000085: DW_TAG_base_type [3] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000008a: DW_TAG_variable [4] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x10010)

0x0000009d: NULL

.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
]
0x00000010: Address table header: length = 0x00000014, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
0x0000000000010010
]

Notice how the debug_addr section for the second compile unit, contains the address in the first compile unit as well.

But if we clear the AddrPool

llvm-project/build_ninja/bin/dsymutil -y test.map -o test.o.dSYM
dwarfdump -a -v test.o.dSYM | less

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004f)

0x0000000c: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000009] = "GoodCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000008)

0x00000024: DW_TAG_subprogram [2] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000010] = "foo")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000012260)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")

0x00000036: DW_TAG_base_type [3] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000003b: DW_TAG_variable [4] (0x0000000c)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000036 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x12260)

0x0000004e: NULL
0x0000004f: Compile Unit: length = 0x0000004b, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000009e)

0x0000005b: DW_TAG_compile_unit [1] *

DW_AT_producer [DW_FORM_strp]     ( .debug_str[0x00000001] = "by_hand")
DW_AT_language [DW_FORM_data2]    (DW_LANG_C_plus_plus)
DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000001d] = "BadCU")
DW_AT_low_pc [DW_FORM_addrx]      (indexed (00000000) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000010)
DW_AT_addr_base [DW_FORM_sec_offset]      (0x00000018)

0x00000073: DW_TAG_subprogram [2] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000023] = "foo1")
DW_AT_low_pc [DW_FORM_addrx]    (indexed (00000000) address = 0x0000000000010010)
DW_AT_high_pc [DW_FORM_data8]   (0x0000000000000010)
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")

0x00000085: DW_TAG_base_type [3] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000014] = "int")

0x0000008a: DW_TAG_variable [4] (0x0000005b)

DW_AT_name [DW_FORM_strp]       ( .debug_str[0x00000018] = "var1")
DW_AT_type [DW_FORM_ref_addr]   (0x0000000000000085 "int")
DW_AT_location [DW_FORM_exprloc]        (DW_OP_addr 0x10010)

0x0000009d: NULL

.debug_addr contents:
0x00000000: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000012260
]
0x00000010: Address table header: length = 0x0000000c, format = DWARF32, version = 0x0005, addr_size = 0x08, seg_size = 0x00
Addrs: [
0x0000000000010010
]

Notice how the debug_addr section matches the one in the object file, and doesn't contain a copy of the address from the first compile unit

Right. We need to clean(), sorry. I missed the fact that DIECloner enumerates several compile units.

Right. We need to clean(), sorry. I missed the fact that DIECloner enumerates several compile units.

Does the rest of the patch look okay?

avl added a comment.Jul 20 2023, 4:30 PM

Right. We need to clean(), sorry. I missed the fact that DIECloner enumerates several compile units.

Does the rest of the patch look okay?

It looks like the comment for DWARFLinker.cpp::1398 is not addressed:

It would be incorrect to return size of original attribute here.
Instead we need to calculate size of DW_FORM_addrx:

unsigned NewAttrSize = Die.addValue(DIEAlloc, static_cast<dwarf::Attribute>(AttrSpec.Attr),
               dwarf::Form::DW_FORM_addrx, DIEInteger(AddrIndex))->sizeOf(OrigUnit.getFormParams());

return NewAttrSize;

and the comment for llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-addresses.test::44

should not it be DW_FORM_addrx here and in following cases?

rastogishubham marked 6 inline comments as done.

Return proper size of DW_FORM_addrx, fix dwarf5-addresses.test

rastogishubham marked an inline comment as done.Jul 20 2023, 6:25 PM

Rebased on top of main

@avl Hi Alexey, can you please let me know if this patch looks okay now? Thanks

avl accepted this revision.Jul 21 2023, 2:24 PM

LGTM. Thank you for spending time on it! Please, also fix a problem with LLVM.tools/dsymutil/ARM::dwarf5-addr_base.test shown while pre-integration testing.

This revision is now accepted and ready to land.Jul 21 2023, 2:24 PM

@avl I think those pre-integration errors are happening because phabricator doesn't have the binaries

avl added a comment.Jul 22 2023, 4:23 AM

@avl I think those pre-integration errors are happening because phabricator doesn't have the binaries

phabricator has these files: Inputs/DWARF5-addr_base/
1.o
2.o
3.o

but they are of zero size. it looks like patch was created incorrectly or these files are broken.

This revision was landed with ongoing or failed builds.Jul 22 2023, 10:49 AM
This revision was automatically updated to reflect the committed changes.