This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] - Dump allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM sections.
ClosedPublic

Authored by grimar on Feb 21 2020, 4:34 AM.

Details

Summary

Sometimes we need to dump an object and build it again from a YAML
description produced. The problem is that obj2yaml does not dump some
of sections, like string tables and symbol tables.

Because of that yaml2obj implicitly creates them and sections created
are not placed at their original locations. They are added to the end of a section list.
That makes a preparing test cases task harder than it can be.

This patch teaches obj2yaml to dump parts of allocatable SHT_STRTAB, SHT_SYMTAB
and SHT_DYNSYM sections to print placeholders for them.
This also allows to preserve usefull parameters, like virtual address.

Diff Detail

Event Timeline

grimar created this revision.Feb 21 2020, 4:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2020, 4:34 AM

Since rL238073, clang no longer produces .shstrtab, but rather uses a unified .strtab for both section names and symbol names. Dumping SHT_STRTAB is necessary to differentiate the two cases.

Do you mean that yaml2obj generated SHT_DYNSYM is not at the normal place (early in the section header table, somewhere before .rodata)?

llvm/tools/obj2yaml/elf2yaml.cpp
258

The contents of these sections are described by other parts of the YAML file. We still dump them so that their positions in the section header table are correctly recorded.

Since rL238073, clang no longer produces .shstrtab, but rather uses a unified .strtab for both section names and symbol names. Dumping SHT_STRTAB is necessary to differentiate the two cases.

Do you mean that yaml2obj generated SHT_DYNSYM is not at the normal place (early in the section header table, somewhere before .rodata)?

No. The place is just different from the one in the original object.
See: imagine we have test.s:

.data
.section .bar,"a"
.quad .bar

.text
.section .foo,"ax"
.quad .foo

.section .zed,"ax"
.quad .foo

and do:

as test.s -o test.o
ld.bfd test.o -shared -o test.so

We have an output with the following section headers:

Section Headers:

[Nr] Name              Type             Address           Offset
     Size              EntSize          Flags  Link  Info  Align
[ 0]                   NULL             0000000000000000  00000000
     0000000000000000  0000000000000000           0     0     0
[ 1] .hash             HASH             0000000000000190  00000190
     0000000000000010  0000000000000004   A       3     0     8
[ 2] .gnu.hash         GNU_HASH         00000000000001a0  000001a0
     000000000000001c  0000000000000000   A       3     0     8
[ 3] .dynsym           DYNSYM           00000000000001c0  000001c0
     0000000000000018  0000000000000018   A       4     1     8
[ 4] .dynstr           STRTAB           00000000000001d8  000001d8
     0000000000000001  0000000000000000   A       0     0     1
[ 5] .rela.dyn         RELA             00000000000001e0  000001e0
     0000000000000048  0000000000000018   A       3     0     8
[ 6] .foo              PROGBITS         0000000000001000  00001000
     0000000000000008  0000000000000000  AX       0     0     1
[ 7] .zed              PROGBITS         0000000000001008  00001008
     0000000000000008  0000000000000000  AX       0     0     1
[ 8] .bar              PROGBITS         0000000000002000  00002000
     0000000000000008  0000000000000000   A       0     0     1
[ 9] .eh_frame         PROGBITS         0000000000002008  00002008
     0000000000000000  0000000000000000   A       0     0     8
[10] .dynamic          DYNAMIC          0000000000003ef0  00002ef0
     0000000000000110  0000000000000010  WA       4     0     8
[11] .symtab           SYMTAB           0000000000000000  00003000
     0000000000000120  0000000000000018          12    12     8
[12] .strtab           STRTAB           0000000000000000  00003120
     000000000000000a  0000000000000000           0     0     1
[13] .shstrtab         STRTAB           0000000000000000  0000312a
     0000000000000061  0000000000000000           0     0     1

Then after invoking obj2yaml and yaml2obj we have:

[Nr] Name              Type             Address           Offset
      Size              EntSize          Flags  Link  Info  Align
 [ 0]                   NULL             0000000000000000  00000000
      0000000000000000  0000000000000000           0     0     0
 [ 1] .hash             HASH             0000000000000190  00000040
      0000000000000010  0000000000000000   A       9     0     8
 [ 2] .gnu.hash         GNU_HASH         00000000000001a0  00000050
      000000000000001c  0000000000000000   A       9     0     8
 [ 3] .rela.dyn         RELA             00000000000001e0  00000070
      0000000000000048  0000000000000018   A       9     0     8
 [ 4] .foo              PROGBITS         0000000000001000  000000b8
      0000000000000008  0000000000000000  AX       0     0     1
 [ 5] .zed              PROGBITS         0000000000001008  000000c0
      0000000000000008  0000000000000000  AX       0     0     1
 [ 6] .bar              PROGBITS         0000000000002000  000000c8
      0000000000000008  0000000000000000   A       0     0     1
 [ 7] .eh_frame         PROGBITS         0000000000002008  000000d0
      0000000000000000  0000000000000000   A       0     0     8
 [ 8] .dynamic          DYNAMIC          0000000000003ef0  000000d0
      0000000000000110  0000000000000010  WA      10     0     8
 [ 9] .dynsym           DYNSYM           0000000000000000  000001e0
      0000000000000018  0000000000000018   A      10     1     8
 [10] .dynstr           STRTAB           0000000000000000  000001f8
      0000000000000001  0000000000000000   A       0     0     1
 [11] .symtab           SYMTAB           0000000000000000  00000200
      0000000000000120  0000000000000018          12    12     8
 [12] .strtab           STRTAB           0000000000000000  00000320
      0000000000000050  0000000000000000           0     0     1
 [13] .shstrtab         STRTAB           0000000000000000  00000370
      0000000000000061  0000000000000000           0     0     1

Note that .dynsym and .dynstr changed their location. That happened
because obj2yaml did not dump them and hence yaml2obj added them to the end of the list,
together with .symtab,.strtab and .shstrtab.
With this patch it is possible to preserve the original sections order as obj2yaml starts to create placeholders for them.

Sometimes we need to dump an object and build it again from a YAML description produced. The problem is that obj2yaml does not dump some of sections, like string tables and symbol tables.

Because of that yaml2obj implicitly creates them and sections created are not placed at their original locations. They are added to the end of a section list. That makes a preparing test cases task harder than it can be.

The description is verbose. Just mention that obj2yaml does not dump SHT_STRTAB/SHT_SYMTAB/SHT_DYNSYM. This is problematic because the output YAML loses track of their positions in the section header table and makes objyaml->yaml2obj round trip fail.

I've not yet looked at the actual details of this change, but I wanted to flag up a concern: if we always dump these sections, we will see extra parts in the YAML of obj2yaml output that are actually unnecessary in many (most?) instances. As a result, people will start writing tests with extra useless details in, as many people use obj2yaml to generate the YAML for a test input, since they don't know enough about how to write YAML from scratch. We would then review these tests and ask people to remove the noise from the YAML (i.e. the unnecessary .dynsym etc descriptions), so that it is clearer what is being tested. We already do this for other things that aren't needed (e.g. section alignments), which I know can be irritating for some people. Now we're making them do yet more work for only a little gain.

I don't have a clear solution to this yet, as I agree we shouldn't throw away important information like section addresses when using obj2yaml. I'm not concerned by the section header order, as that technically has no impact on anything (as long as section indexes in other things are updated correctly). Could we perhaps only list the implicit sections when they have a non-default value for something (i.e. a value that would be different from when they are not specified, such as a non-zero address)?

I don't have a clear solution to this yet, as I agree we shouldn't throw away important information like section addresses when using obj2yaml. I'm not concerned by the section header order, as that technically has no impact on anything (as long as section indexes in other things are updated correctly).

In my use case the order is important. In short: I've experimented with a code to dump program headers and make an object produced
after obj->obj2yaml->yaml2obj->obj operation have the same segments, sections in segments etc.
In this case it is important to keep the order and addresses of allocatable implicit sections.

Could we perhaps only list the implicit sections when they have a non-default value for something (i.e. a value that would be different from when they are not specified, such as a non-zero address)?

A non-zero address for allocatable sections is pretty common. I.e. we will always print .dynsym/.dynstr sections..

My concern is also that reading SHT_STRTAB/SHT_SYMTAB/SHT_DYNSYM and printing
different from "default" ones is a much more complicated logic than simply dumping them as RawContentSection, like this patch does.

What if we introduce an option, e.g. --dump-implicit-sec to dump them? It could be used for test cases creating
purposes and hide such details what should work good for a general case probably.

I don't have a clear solution to this yet, as I agree we shouldn't throw away important information like section addresses when using obj2yaml. I'm not concerned by the section header order, as that technically has no impact on anything (as long as section indexes in other things are updated correctly).

In my use case the order is important. In short: I've experimented with a code to dump program headers and make an object produced
after obj->obj2yaml->yaml2obj->obj operation have the same segments, sections in segments etc.
In this case it is important to keep the order and addresses of allocatable implicit sections.

Could we perhaps only list the implicit sections when they have a non-default value for something (i.e. a value that would be different from when they are not specified, such as a non-zero address)?

A non-zero address for allocatable sections is pretty common. I.e. we will always print .dynsym/.dynstr sections..

My concern is also that reading SHT_STRTAB/SHT_SYMTAB/SHT_DYNSYM and printing
different from "default" ones is a much more complicated logic than simply dumping them as RawContentSection, like this patch does.

What if we introduce an option, e.g. --dump-implicit-sec to dump them? It could be used for test cases creating
purposes and hide such details what should work good for a general case probably.

Okay, you make a fair amount of sense. I don't think a switch is necessarily needed. That being said, perhaps we could limit this to the dynamic symbol table and dynamic string table? I feel like .symtab and the non-dynamic string table(s) do not need printing, since they will normally be at the end, don't have addresses, and will just cause unnecessary noise if in the YAML. I think you'll find the number of tests impacted will subsequently reduce significantly. Perhaps then a switch (e.g. --explicit-symtab-strtab) would be wise to enable dumping these sections. What do you think?

Okay, you make a fair amount of sense. I don't think a switch is necessarily needed. That being said, perhaps we could limit this to the dynamic symbol table and dynamic string table? I feel like .symtab and the non-dynamic string table(s) do not need printing, since they will normally be at the end, don't have addresses, and will just cause unnecessary noise if in the YAML.

I think it is fine, I've also though about such possibility and during work on another patch also realized that dumping just allocatable sections is enough at least for my current use cases.

I think you'll find the number of tests impacted will subsequently reduce significantly. Perhaps then a switch (e.g. --explicit-symtab-strtab) would be wise to enable dumping these sections. What do you think?

Sounds good to me too.

grimar planned changes to this revision.Feb 27 2020, 2:07 AM

Will change in according to discussion.

grimar updated this revision to Diff 246910.Feb 27 2020, 4:25 AM
grimar marked an inline comment as done.
grimar retitled this revision from [obj2yaml] - Dump SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM sections. to [obj2yaml] - Dump allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM sections..
grimar edited the summary of this revision. (Show Details)
  • Dump only allocatable sections.

Dump allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM sections.

.dynstr => allocatable SHT_STRTAB
.dynsym => SHT_DYNSYM

I haven't seen a case for allocatable SHT_SYMTAB but due to how the switch cases are organized, I think it does not hurt to place SHT_SYMTAB there.

Non-allocatable SHT_STRTAB and SHT_SYMTAB and usually in the end of the file. Their presence changes offsets of themselves and the section header table. Making them implicit seems fine.

With this scheme, a new option will not be needed. I am in favor of such a change.

grimar updated this revision to Diff 247185.Feb 28 2020, 1:19 AM
  • Updated comments in the implicit-sections-order.yaml test case.

A thought, but I think it's a pre-existing issue: let's say I have a custom SHT_STRTAB section that isn't the symbol or section header string table. This code seems to prevent that section being emitted?

llvm/test/tools/obj2yaml/duplicate-symbol-and-section-names.test
128

Unrelated change which should probably be committed separately?

llvm/test/tools/obj2yaml/implicit-sections-order.yaml
8

Here and below, maybe instead of "print" you should say "explicitly declare" or something to that effect?

115

ubnormal -> abnormal

llvm/tools/obj2yaml/elf2yaml.cpp
260

Perhaps worth saying why we only dump allocatable sections a little more directly.

261

Another ones -> Some sections

262

address -> addresses

grimar marked 2 inline comments as done.Feb 28 2020, 2:21 AM

A thought, but I think it's a pre-existing issue: let's say I have a custom SHT_STRTAB section that isn't the symbol or section header string table. This code seems to prevent that section being emitted?

Before this patch seems we never emitted those. With this patch we will emit something (but not the Content for example) for allocatable sections and will emit nothing for non-allocatable.
So, situation is slightly better with this patch, but still far from ideal.

llvm/test/tools/obj2yaml/duplicate-symbol-and-section-names.test
128

It is related: now between the FileHeader key and Symbols there are also Sections because .dynsym/.dynstr are created.

grimar updated this revision to Diff 247198.Feb 28 2020, 2:43 AM
grimar marked 5 inline comments as done.
  • Addressed review comments.
This revision is now accepted and ready to land.Mar 2 2020, 1:22 AM
MaskRay accepted this revision.Mar 2 2020, 9:08 AM
This revision was automatically updated to reflect the committed changes.