This is an archive of the discontinued LLVM Phabricator instance.

[DWARF parsers] - Add a way to get section indices when call DWARFUnit::collectAddressRanges()
AbandonedPublic

Authored by grimar on Oct 20 2016, 5:03 AM.

Details

Summary

For LLD --gdb-index patch (https://reviews.llvm.org/D25821) I need to write the address area. The address area is a sequence of address
entries, where each entrie contains low address, high address and CU index.

I can get this info (using DWARFUnit::collectAddressRanges())
from DW_AT_low_pc/DW_AT_high_pc attribute which are relocated addresses.
Problem is that there is no way to get section index these addresses refer to. At the moment of building
index I do not yet know the final input sections offsets and output section VA's, so I think
I need to store section index to use it later (that is how above patch do).

Patch do next things:

  • We have RelocAddrMap, which is map of pairs [relocation width, relocation value]

and was declared both in DWARFContext.h and DWARFRelocMap.h. I leaved declaration only in DWARFRelocMap.h.
At fact first element of pair was never used in code. So I replaced it with section index.

  • I modified content of DWARFAddressRangesVector to keep section index as well, so now it is possible to get

it via DWARFUnit::collectAddressRanges() call.

Diff Detail

Event Timeline

grimar updated this revision to Diff 75287.Oct 20 2016, 5:03 AM
grimar retitled this revision from to [DWARF parsers] - Add a way to get section indexes when call DWARFUnit::collectAddressRanges().
grimar updated this object.
grimar added subscribers: llvm-commits, grimar, evgeny777 and 2 others.
grimar retitled this revision from [DWARF parsers] - Add a way to get section indexes when call DWARFUnit::collectAddressRanges() to [DWARF parsers] - Add a way to get section indices when call DWARFUnit::collectAddressRanges().Oct 20 2016, 5:05 AM

Seems like a lot of work to include the section index everywhere. Can't the section index just be looked up using the ObjectFile given an address? Something like:

uint32_t ObjectFile::getSectionIndexForAddress(uint64_t addr);

Seems like a lot of work to include the section index everywhere. Can't the section index just be looked up using the ObjectFile given an address? Something like:

uint32_t ObjectFile::getSectionIndexForAddress(uint64_t addr);

Could you elaborate what addr should I use for such search ?
getAddressRanges() returns me LowPC/HighPC.

For example for gdb-index-a.elf from D24706, LowPC = 0x0, HighPC = 0xb, readelf shows:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 00000b 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 00004b 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 00004b 000000 00  WA  0   0  1
  [ 4] .debug_addr       PROGBITS        0000000000000000 00004b 000008 00      0   0  1
  [ 5] .rela.debug_addr  RELA            0000000000000000 000380 000018 18   I 23   4  8
.....

How having just there values I can know that section I am looking is .text ?

Seems like a lot of work to include the section index everywhere. Can't the section index just be looked up using the ObjectFile given an address? Something like:

uint32_t ObjectFile::getSectionIndexForAddress(uint64_t addr);

Could you elaborate what addr should I use for such search ?
getAddressRanges() returns me LowPC/HighPC.

For example for gdb-index-a.elf from D24706, LowPC = 0x0, HighPC = 0xb, readelf shows:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 00000b 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 00004b 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 00004b 000000 00  WA  0   0  1
  [ 4] .debug_addr       PROGBITS        0000000000000000 00004b 000008 00      0   0  1
  [ 5] .rela.debug_addr  RELA            0000000000000000 000380 000018 18   I 23   4  8
.....

How having just there values I can know that section I am looking is .text ?

You can see that .text has a size of 0xb and starts at 0x0. So the address would fall into the .text section. You would probably want to keep a table of the PROGBITS sections that also have AX permissions if you know you are looking for code addresses. Someone else more familiar with ELF .o files might be able to help.

You can see that .text has a size of 0xb and starts at 0x0. So the address would fall into the .text sectio

Things can be slightly more complex, like here (this is result of compilation with -ffunction-sections):

[ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
[ 3] .text.f           PROGBITS        0000000000000000 000040 00000b 00  AX  0   0 16
[ 4] .text.g           PROGBITS        0000000000000000 000050 00000b 00  AX  0   0 16
[ 5] .text.h           PROGBITS        0000000000000000 000060 00000b 00  AX  0   0 16

You can see that .text has a size of 0xb and starts at 0x0. So the address would fall into the .text sectio

Things can be slightly more complex, like here (this is result of compilation with -ffunction-sections):

[ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
[ 3] .text.f           PROGBITS        0000000000000000 000040 00000b 00  AX  0   0 16
[ 4] .text.g           PROGBITS        0000000000000000 000050 00000b 00  AX  0   0 16
[ 5] .text.h           PROGBITS        0000000000000000 000060 00000b 00  AX  0   0 16

Is this just a raw dumper that isn't applying relocations? Do ELF .o files have bogus values in the address field of all sections in the section headers? Seems like there should be relocations that would fix these up no?

After speaking with some local ELF experts over here, my proposed solution won't work. Would it be possible to encode the section index into the upper bits of any addresses that need to retain know their section? If not it seems weird to just be changing a few places that use high and low PC in a few areas of the DWARF parser. It would seem that we would need to change all places that can use addresses over to use some structure like:

struct FileAddress {
    uint64_t vmAddr;
    uint32_t sectIdx;
};

instead of just a "uint64_t" as a file address.

grimar added a comment.EditedOct 24 2016, 2:25 AM

After speaking with some local ELF experts over here, my proposed solution won't work. Would it be possible to encode the section index into the upper bits of any addresses that need to retain know their section?

I think that would be not correct correct solution to mix address and section index in a variable.

If not it seems weird to just be changing a few places that use high and low PC in a few areas of the DWARF parser. It would seem that we would need to change all places that can use addresses over to use some structure like:

struct FileAddress {
    uint64_t vmAddr;
    uint32_t sectIdx;
};

instead of just a "uint64_t" as a file address.

It may be worth to do, but may be not. Looks generation of .gdb_index section in LLD was the first consumer who faced the absence of this API ? For now in LLD we only need it in this place and that seems to be enough.
So I am not sure we really need to change all the places to use this structure. That probably works for consistency, but also probably is just excessive. What the point to do that if nobody going to use it ?

In the current implementation of LLVM DWARF parser LowPC and HighPC are offsets from the start of the section. This causes problems when object file has comdat groups or was compiled with -ffunction-sections. I think that problem can be solved if LowPC and HighPC would offsets from the start of the file not section. This , to my understanding, can be done quite easily by adding sh_offset to R.Value in DWARFContextInMemory constructor. After that all SecNdx related stuff can be removed: section index can be easily reconstructed given file offset. What do you think?

In the current implementation of LLVM DWARF parser LowPC and HighPC are offsets from the start of the section. This causes problems when object file has comdat groups or was compiled with -ffunction-sections. I think that problem can be solved if LowPC and HighPC would offsets from the start of the file not section. This , to my understanding, can be done quite easily by adding sh_offset to R.Value in DWARFContextInMemory constructor. After that all SecNdx related stuff can be removed: section index can be easily reconstructed given file offset. What do you think?

I am not sure that is correct thing to do. From DWARF manual: "The value of the DW_AT_low_pc attribute is the relocated address of the first
instruction associated with the entity.". If we do the change you suggest we will end up with API that is called getLowAndHighPC() but returns the absolute values instead of relocatable values, so it will be a misnaming I think.
I also not sure that changing R.value will not break anything else (that not only used for lowPC/highPC), though that is different question.

Interesting to hear other opinions, but now I prefer either my or Greg's suggested solution to use FileAddress struct.

aprantl added inline comments.Nov 1 2016, 8:39 AM
lib/DebugInfo/DWARF/DWARFDebugRangeList.cpp
65

I think we typically spell this ~0ULL.

grimar added inline comments.Nov 1 2016, 9:02 AM
lib/DebugInfo/DWARF/DWARFDebugRangeList.cpp
65

Thanks for comment, but this patch was abandoned already :)
Because D25821 implementation changed and now no need to change dwardf parsers anymore.