This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Change how we handle implicit sections.
ClosedPublic

Authored by grimar on Jun 3 2019, 5:42 AM.

Details

Summary

We have a few sections that can be added implicitly to the output:
".dynsym", ".dynstr", ".symtab", ".strtab" and ".shstrtab".

Problem appears when such section is listed explicitly in YAML.
In that case it's content is written twice:
first time during writing of regular sections listed in the document
and second time during special handling.

Because of that their file offsets can become unexpectedly broken:
(yaml file for sample below lists .dynsym explicitly before .text.foo)

Before patch:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynsym           DYNSYM           0000000000000100  00000250
       0000000000000030  0000000000000018   A       6     0     8
  [ 2] .text.foo         PROGBITS         0000000000000200  00000200
       0000000000000000  0000000000000000  AX       0     0     0

After patch:
Section Headers:
  [Nr] Name         Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynsym           DYNSYM           0000000000000100  00000200
       0000000000000030  0000000000000018   A       6     0     8
  [ 2] .text.foo         PROGBITS         0000000000000200  00000230
       0000000000000000  0000000000000000  AX       0     0     0

This patch reorganizes our code and fixes the issue described.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Jun 3 2019, 5:42 AM
grimar updated this revision to Diff 202697.Jun 3 2019, 5:45 AM
grimar retitled this revision from [yaml2obj] - Change gow we handle implicit sections. to [yaml2obj] - Change how we handle implicit sections..
  • Removed unused properties from yaml.
grimar updated this revision to Diff 202698.Jun 3 2019, 5:49 AM
  • Removed even more unused properties from YAML.
grimar planned changes to this revision.Jun 3 2019, 6:01 AM

Planning changes (found an assertion failture because of this in one of llvm-readelf test cases).

grimar updated this revision to Diff 202732.Jun 3 2019, 8:55 AM
  • Updated implementation.
jhenderson added inline comments.Jun 4 2019, 3:27 AM
test/tools/yaml2obj/implicit-sections.test
4 ↗(On Diff #202732)

Your comment says that you are checking the file offset of these sections, but the CHECK tests the other properties too. Should the comment be expanded to say something like "Check the section header properties of ..." or should the text being checked for be relaxed (e.g. by regexing out the Type and Address columns, and removing the Size and Nr columns?

FWIW, I think there is probably value in checking the Address, Size and Type output, if they aren't explicitly tested elsewhere, so I think the comment is the thing that needs changing.

Also, does the offset really need to be ascending? I don't think that's strictly required.

40 ↗(On Diff #202732)

Maybe worth putting a comment to say that the DynamicSymbols block is required for the .dynsym to be generated?

It might also be good to have a test case for no dynamic symbol.

tools/yaml2obj/yaml2elf.cpp
260 ↗(On Diff #202732)

Here and elsewhere, I'd probably rename DocSec to YAMLSec or similar. I think that's a little clearer.

261 ↗(On Diff #202732)

My immediate thought was "What was already initialized?" so this probably should be a bit more verbose and say "Check if the header was already initialized".

310 ↗(On Diff #202732)

"later. But" -> Either "later, but" or "later. However," (I think either is more natural English).

311 ↗(On Diff #202732)

That allows keeping the correct file offset -> This ensures the file offset remains correct.

353 ↗(On Diff #202732)

Missing trailing full stop.

358 ↗(On Diff #202732)

Missing trailing full stop.

403 ↗(On Diff #202732)

What if the raw section has no info specified? I'd expect it to auto-derive the property in this case.

405 ↗(On Diff #202732)

Casting to unsigned here seems wrong, since entsize is a 64 bit number...

407 ↗(On Diff #202732)

Ditto on both points. AddressAlign is a 64-bit field, so casting to unsigned seems wrong. Additionally, a user might expect yaml2obj to derive the AddressAlign field automatically if they don't explicitly specify it.

410 ↗(On Diff #202732)

Not sure exactly where it belongs, but I think we need some sort of error in case somebody explicitly specifies content for a symbol table, but also specifies static symbols.

428 ↗(On Diff #202732)

What about a user who explicitly specifies sh_size, but not Content? Is that possible currently for symbol table sections?

440 ↗(On Diff #202732)

See above comments re. AddressAlign casting/auto-deriving.

444 ↗(On Diff #202732)

Similar to above, do we need an error if a user has specified a Symbol and a custom string table?

grimar updated this revision to Diff 202917.Jun 4 2019, 6:03 AM
grimar marked 21 inline comments as done.
  • Addressed review comments.

Note:
Many of my answers suggest changing things in the follow-ups.
(because other independent changes need to be done to achieve the ideal behavior).
If you think it will be better I can first prepare patches for those things and rebase this one after.
I fine with either way.

test/tools/yaml2obj/implicit-sections.test
4 ↗(On Diff #202732)

I rewrote the comment.

FWIW, I think there is probably value in checking the Address, Size and Type output, if they aren't explicitly tested elsewhere, so I think the comment is the thing that needs changing.

I think testing the Size is usefull, because allows to check how the Off was calculated.
Testing Address is valueable for demonstration of the correlation between Off and Address.
It is not harmfull to check the 'Type` since how section header is written only once (i.e. bug was fixed and
logic changed significantly. Previously they were written as RawContentSection at first)
and this test case is probably only one when we show the behavior of all implicit sections in the
situation when them were all of them are explicitly described in YAML.

Also, does the offset really need to be ascending? I don't think that's strictly required.

I think the order of sections should match the YAML.
Without this patch offsets would be unordered, what looks obviously wrong to me:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynstr           STRTAB           0000000000000100  00000280
       0000000000000009  0000000000000000   A       0     0     1
  [ 2] .dynsym           DYNSYM           0000000000000150  00000250
       0000000000000030  0000000000000018   A       1     0     8
  [ 3] .symtab           SYMTAB           0000000000000000  00000200
       0000000000000018  0000000000000018           4     0     8
  [ 4] .strtab           STRTAB           0000000000000000  00000218
       0000000000000001  0000000000000000           0     0     1
  [ 5] .shstrtab         STRTAB           0000000000000000  00000219
       0000000000000035  0000000000000000           0     0     1
  [ 6] .text.foo         PROGBITS         0000000000000200  00000200
       0000000000000000  0000000000000000           0     0     0

And that it what my comment tried to say - if there was not described something very special in YAML (i.e. offsets were not explicitly set somehow),
then I think it is very natural to place sections in the order they described and make them have ascending file offset by default.
(that is what all linkers do I believe, for example).

The requirement I know is:
"To obtain this efficiency in practice, executable and shared object files must have segment images whose file offsets and virtual addresses are congruent, modulo the page size."
(https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-34713/index.html)

yaml2obj does not follow it it seems. At least for sample above. But still having offset(.dynstr) > offset(.dynsym) and
VA(.dynstr) < VA(.dynsym) at the same time it something very strange/unnatural IMO.

40 ↗(On Diff #202732)

Done.

tools/yaml2obj/yaml2elf.cpp
403 ↗(On Diff #202732)

Yes, but we probably need to make RawSec->Info be Optional for doing that. It is not true atm.
(I am planning to improve this and other situations in a followups).

405 ↗(On Diff #202732)

Hmm, right.

407 ↗(On Diff #202732)

Casting was fixed.

Additionally, a user might expect yaml2obj to derive the AddressAlign field automatically if they don't explicitly specify it.

Again, it seems AddressAlign needs to be optional. I'd leave it for a follow up(s).

410 ↗(On Diff #202732)

Yes, sure I though about this too. I would like to do it in a follow up because I guess we
need to make Content be Optional to do that correctly.

428 ↗(On Diff #202732)

It was possible before this patch to do:

!ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_DYN
  Machine: EM_X86_64
Sections:
  - Name:    .dynsym
    Type:    SHT_DYNSYM
    Address: 0x150
    Size:    0x200

And get the .dynsym' of size 0x200. With this patch the default symbol table will be created
(because there is no Content) in this case and Size will be ignored.

Again, it seems to me that for correct implementation Size need to be Optional.
Otherwise we can't distinguish "No Size was set" and "Size was set to zero" cases.

444 ↗(On Diff #202732)

Similar answer :]

jhenderson added inline comments.Jun 5 2019, 2:39 AM
test/tools/yaml2obj/implicit-sections.test
3 ↗(On Diff #202917)

sections usually -> sections are usually

49 ↗(On Diff #202917)

in case -> for the case

50 ↗(On Diff #202917)

but sections -> but the sections

tools/yaml2obj/yaml2elf.cpp
310–311 ↗(On Diff #202917)

clang-format?

403 ↗(On Diff #202732)

Okay, sounds good.

405 ↗(On Diff #202732)

Actually, I lied slightly, in that sh_entsize's size is dependent on the ELF format (32 versus 64 bits). Not sure what the right approach here is though. We could truncate (as I think this will do) or emit an error (but the latter might be a little annoying to implement. Truncation is probably fine for now.

407 ↗(On Diff #202732)

Okay.

410 ↗(On Diff #202732)

Sounds good.

428 ↗(On Diff #202732)

Hmmm... I'm not comfortable about this behaviour change, since this could test failures/spurious passes. If there are no uses of this approach currently, I guess it's okay though.

grimar marked an inline comment as done.Jun 5 2019, 2:45 AM
grimar added inline comments.
tools/yaml2obj/yaml2elf.cpp
405 ↗(On Diff #202732)

Actually, I lied slightly, in that sh_entsize's size is dependent on the ELF format (32 versus 64 bits).

Yes, I know, but I think we always truncate values everywhere.
We have a comment saying "Just use 64, since it can hold 32-bit values too." in ELFYAML.h where we declare types for ELF things. And I do not think I saw any code checking we do not truncate, so I also guess it is fine for now.

(Anyways., if we want to fix it, it is probably should be checked much earlier. I.e. we might want to report an overflow error during parsing YAML on 32 bits platforms probably).

jhenderson added inline comments.Jun 5 2019, 3:15 AM
tools/yaml2obj/yaml2elf.cpp
405 ↗(On Diff #202732)

Yes, you're probably right.

grimar updated this revision to Diff 203119.Jun 5 2019, 4:23 AM
grimar marked 5 inline comments as done.
  • Addressed review comments.
tools/yaml2obj/yaml2elf.cpp
428 ↗(On Diff #202732)

I'll work on a follow up for this in the first place after landing this patch.

jhenderson accepted this revision.Jun 5 2019, 5:33 AM

Okay, LGTM.

This revision is now accepted and ready to land.Jun 5 2019, 5:33 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2019, 6:13 AM