This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Support selecting the location of the section header table.
AbandonedPublic

Authored by grimar on Dec 22 2020, 1:44 AM.

Details

Summary

I've found that sometimes it might be convenient when writing a test to not
put anything after all sections data. Currently we write the section header
table there.

This adds the Location sub key to SectionHeaderTable key:

SectionHeaderTable:
  Location:  <value>

Location can be either AfterSecData or BeforeSecData currently.
This allows to place the section header table before sections data.

This patch also opens road to implement placing the section header table at
an arbitrary position (i.e. between sections).

Diff Detail

Event Timeline

grimar created this revision.Dec 22 2020, 1:44 AM
grimar requested review of this revision.Dec 22 2020, 1:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 1:44 AM

The placement was changed by D67221. This patch adds an option to change the placement which is definitely useful. A design with the syntax is requires so I think it'd be best to wait after @jhenderson comes back from the vacation. I haven't look hard whether the code can be simplified yet.

llvm/lib/ObjectYAML/ELFYAML.cpp
882

Most binary utilities do not capitalize diagnostics.

llvm/test/tools/yaml2obj/ELF/section-headers-location.yaml
1

Test that the "Location" key of "SectionHeaderTable" tag can change the placement of the section header table.

4

"AfterSecData" is the default which places the section header table after all section contents (i.e. at the end of the file)

39

"Location" cannot be used when "NoHeaders" is true

(Personally I think words like "Check" "we do not allow" can be removed)

If there's a case where this is useful, I definitely support it. More generally, we may want to be able to do the same with program header table placement. I don't know if we have a use-case for that yet, but it's probably worth keeping it in mind as we design this. BeforeSecData is therefore a little bit ambiguous here. For example, between the ELF header and program header table is also "before" the section data. Maybe the key words should be AfterEhdr, AfterPhdrs, AfterSections (or possibly AtEOF for the last one). We could probably not have one of AfterEhdr or AfterPhdrs for now. Longer-term, I could imagine it might be useful to allow arbitrary offsets for the section header table, so we should consider this in our design too.

I've got a few competing ideas, and am listing them here to see what others think.

Option 1: encode location in SectionHeaderTable, possibly using the keyword Offset (I think Offset is better than Location, for consistency, with named values like AfterSections or AfterSecData being special values). The offset would work the same as Offset does for sections, and we'd need to effectively insert the section header table block between sections at the requested position, when specified as an arbitrary value.
Option 2: add section header offset property to the FileHeader which controls the section header table position. This is basically the same as the previous idea, but with the location specified in a different place. The advantage with this approach is that in the future, the same approach could be followed with the program header table. It also could sit alongside the ESHoff (I forget its exact spelling) field nicely (the latter providing the value in the header itself, whilst the position is determined by a SectionHeaderOffset field or similar.
Option 3: Have a separate "Layout" block within the YAML which would allow you to define the layout. It might look a bit like this:

Layout:
  - FileHeader
  - Section1
  - ProgramHeaders
  - Section2
  - SectionHeaders
  - Section3

I'm not sure how this option would work alongside Offset values for sections, or what should happen if e.g. the file header entry were to be omitted.

I am kind of leaning towards option 2, but see benefits to all the approaches. What do you think?

(I haven't looked at the new code or test yet. I'll do that after the design has been discussed further)

Option 3: Have a separate "Layout" block within the YAML which would allow you to define the layout. It might look a bit like this:

Layout:
  - FileHeader
  - Section1
  - ProgramHeaders
  - Section2
  - SectionHeaders
  - Section3

I'm not sure how this option would work alongside Offset values for sections, or what should happen if e.g. the file header entry were to be omitted.

If we really want to add a way to select where program headers are placed and also to support placing both program headers and section headers at an arbitrary offsets,
then I see the following solution:

Currently, internally, we have a list of Chunks, which consists of sections and a special Fill chunk.
If we rename the "Sections" YAML key to "Layout" (or alike), then we might be able to introduce 2 more special optional chunks: ProgramHeaders and SectionHeaders.

Then we will be able to write YAMLs like this:

--- !ELF
FileHeader:
  Class: ELFCLASS64
  Data:  ELFDATA2LSB
  Type:  ET_DYN
Layout:
  - Name: .section1
    Type: SHT_PROGBITS
    Size: 0
  - Type: SectionHeaders
     Offset: 0x1000
  - Name: .section2
     Type: SHT_PROGBITS
     Offset: 0x1200
  - Type: ProgramHeaders

I.e. it might look kind of close to what you've suggested in "Option 3", but doesn't introduce a one more separate YAML block, but allows to reuse the "Sections" block.

Perhaps, we can just have the "Sections" name unchanged for now and think about changing it (or keeping) later, when the functionality will be committed.

Option 3: Have a separate "Layout" block within the YAML which would allow you to define the layout. It might look a bit like this:

Layout:
  - FileHeader
  - Section1
  - ProgramHeaders
  - Section2
  - SectionHeaders
  - Section3

I'm not sure how this option would work alongside Offset values for sections, or what should happen if e.g. the file header entry were to be omitted.

If we really want to add a way to select where program headers are placed and also to support placing both program headers and section headers at an arbitrary offsets,
then I see the following solution:

Currently, internally, we have a list of Chunks, which consists of sections and a special Fill chunk.
If we rename the "Sections" YAML key to "Layout" (or alike), then we might be able to introduce 2 more special optional chunks: ProgramHeaders and SectionHeaders.

Then we will be able to write YAMLs like this:

--- !ELF
FileHeader:
  Class: ELFCLASS64
  Data:  ELFDATA2LSB
  Type:  ET_DYN
Layout:
  - Name: .section1
    Type: SHT_PROGBITS
    Size: 0
  - Type: SectionHeaders
     Offset: 0x1000
  - Name: .section2
     Type: SHT_PROGBITS
     Offset: 0x1200
  - Type: ProgramHeaders

I.e. it might look kind of close to what you've suggested in "Option 3", but doesn't introduce a one more separate YAML block, but allows to reuse the "Sections" block.

Perhaps, we can just have the "Sections" name unchanged for now and think about changing it (or keeping) later, when the functionality will be committed.

Making program headers and the section header table kinds of Chunk makes a lot of sense to me, and I think would fit the design well. Indeed, the general concept is similar to one we already implement in a proprietary objcopy-like tool for performing layout. My main question with this proposal is this: do we want a list (whether called "Layout" or "Sections"), which defines properties of Sections and Fills, but not of the section header table and program header table (apart from their position)? It feels inconsistent to me. We could potentially move the ProgramHeaders and SectionHeaderTable elements into a member of the Layout tag, as in the rough example below (exact syntax would need finalising), but I'm not convinced whether it looks nice or not.

Layout:
  - ProgramHeaderTable:
      Offset: 0x100
      ProgramHeaders:
        - Type: PT_LOAD
        - Type: PT_LOAD
        ...
  - Section:
    Type: SHT_PROGBITS
    Name: .foo
    Offset: 0x200
  - Fill:
    Offset: 0x1000
    Size: 0x20
  - SectionHeaderTable:
    ...

(or similar to what you posted). An alternative would be to move the Offset aspect of the sections into the Layout table only, and possibly moving Fills into that Layout block too out of the Sections list. This has the advantage that layout information is generally kept in the Layout table, whilst the details of what a thing actually is (e.g. section type, size, address etc) are kept in appropriate parts of the document. This is closer to my original option 3:

FileHeader:
  ...
Sections:
  - Name: .foo
    Type: SHT_PROGBITS
  - Name: .bar
    Type: SHT_PROGBITS
SectionHeaderTable:
  ...
ProgramHeaders:
  ...
Layout:
   - Type: Section
    Name: .foo
    Offset: 0x1000
  - Type: Fill
    # Offset could be implicitly calculated here; fills move out of Sections to Layout.
    Size: 0x100
  - Type: SectionHeaderTable
    Offset: 0x2000
  - Type: ProgramHeaderTable
    Offset: 0x3000
  - Type: Section
    Name: .bar
    Offset: 0x3500

I think I mildly prefer the second one, as it'll be easier to retrofit and feels a little cleaner. We'd need some implicit rules that the program header table appears at the front and the section header table at the back if they aren't specifically mentioned in either case.

grimar added a comment.EditedJan 13 2021, 1:47 AM

An alternative would be to move the Offset aspect of the sections into the Layout table only, and possibly moving Fills into that Layout block too out of the Sections list. This has the advantage that layout information is generally kept in the Layout table, whilst the details of what a thing actually is (e.g. section type, size, address etc) are kept in appropriate parts of the document. This is closer to my original option 3:

FileHeader:
  ...
Sections:
  - Name: .foo
    Type: SHT_PROGBITS
  - Name: .bar
    Type: SHT_PROGBITS
SectionHeaderTable:
  ...
ProgramHeaders:
  ...
Layout:
   - Type: Section
    Name: .foo
    Offset: 0x1000
  - Type: Fill
    # Offset could be implicitly calculated here; fills move out of Sections to Layout.
    Size: 0x100
  - Type: SectionHeaderTable
    Offset: 0x2000
  - Type: ProgramHeaderTable
    Offset: 0x3000
  - Type: Section
    Name: .bar
    Offset: 0x3500

I think I mildly prefer the second one, as it'll be easier to retrofit and feels a little cleaner. We'd need some implicit rules that the program header table appears at the front and the section header table at the back if they aren't specifically mentioned in either case.

My concern is that it might hurt readability of some of our test cases, where we have a correlation between section addresses/sizes and offsets (an often case I think):
if we will have the "Offset" aspect in "Layout" and "Address"/"Size"/"Content" in "Sections", it makes more difficult to adjust them as "Size"/"Content" affects on "Offset" directly. And the latter one might need to affect on "Address".

Another concern is that having 2 lists might cause the confusion: currently we place sections in the order they are mentioned in "Sections".
With this new scheme I assume the order from "Layout" will be used and the order of sections in "Sections" becomes not important. It feels slightly more complicated, e.g.
people might try to describe program headers using the "Sections", though they need to use the "Layout". We might add the check to verify that the order is synchronized of course, but still...

Personally I'd prefer the first option you mentioned.
Also, if we think about Symbols and DynamicSymbols keys: currently it is impossible to describe
multiple symbol tables, because Symbols is implicitly attached to .symtab and DynamicSymbols is attached to .dynsym.
Though instead we could move them to allow describing multiple symbol tables and the whole structure might look like:

--- !ELF
FileHeader:
  Class:   ELFCLASS32
  Data:    ELFDATA2LSB
  Type:    ET_REL
  Machine: EM_386
Layout:
  - ProgramHeaderTable:
      Offset: 0x100
      ProgramHeaders:
        - Type: PT_LOAD
        - Type: PT_LOAD
        ...
  - Section:
    Type: SHT_PROGBITS
    Name: .foo
    Offset: 0x200
  - Type: Fill
    Offset: 0x1000
    Size: 0x20
  - Name: .symtab
    Type: SHT_SYMTAB
    Symbols:
      - Type:  STT_SECTION
        Index: 0
  - Name: .dynsym_custom
    Type: SHT_DYNSYM
    Symbols:
      - Name: foo
  - Name: .dynsym
    Type: SHT_DYNSYM
    Symbols:
      - Name: foo
      - Name: bar
  - SectionHeaderTable:
    ...

That's a good point about the address/offset comment. I'm happy with the other idea then. @MaskRay, do you have any thoughts?

grimar abandoned this revision.Jan 22 2021, 4:53 AM

D95140 was posted instead.