This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - ProgramHeaders: introduce FirstSec/LastSec instead of Sections list.
ClosedPublic

Authored by grimar on Oct 30 2020, 6:16 AM.

Details

Summary

Imagine we have a YAML declaration of few sections: foo1, <unnamed 2>, foo3, foo4.

To put them into segment we can do (1*):

Sections:
 - Section: foo1
 - Section: foo4

or we can use (2*):

Sections:
 - Section: foo1
 - Section: foo3
 - Section: foo4

or (3*) :

Sections:
 - Section: foo1
## "(index 2)" here is a name that we automatically created for a unnamed section.
 - Section: (index 2)
 - Section: foo3
 - Section: foo4

It looks really confusing that we don't have to list all of sections.

At first I've tried to make this rule stricter and report an error when there is a gap
(i.e. when a section is included into segment, but not listed explicitly).
This did not work perfect, because such approach conflicts with unnamed sections/fills (see (3*)).

This patch drops "Sections" key and introduces 2 keys instead: FirstSec and LastSec.
Both are optional.

Diff Detail

Event Timeline

grimar created this revision.Oct 30 2020, 6:16 AM
Herald added a project: Restricted Project. · View Herald Transcript
grimar requested review of this revision.Oct 30 2020, 6:16 AM
MaskRay added a comment.EditedOct 30 2020, 9:51 AM

The idea looks good. It is used by LLD to compute p_filesz/p_memsz:)

I can understand that if a segment contains many sections, it can be inconvenient to enumerate every one.

As long as this approach still allows creating program headers with the exact same contents as before (i.e. it's as flexible as possible), then it makes sense to me too. Happy for you to proceed.

grimar added a comment.Nov 2 2020, 2:13 AM

As long as this approach still allows creating program headers with the exact same contents as before (i.e. it's as flexible as possible), then it makes sense to me too. Happy for you to proceed.

It indeed will be at least on the same level of flexibility.

FTR, partially I was inspired by the following possible situation. Currently we allocate a space for nobits sections in a segment when there are
fills or non-nobits sections after them:

static bool shouldAllocateFileSpace(ArrayRef<ELFYAML::ProgramHeader> Phdrs,
                                    const ELFYAML::NoBitsSection &S) {
  for (const ELFYAML::ProgramHeader &PH : Phdrs) {
    auto It = llvm::find_if(
        PH.Chunks, [&](ELFYAML::Chunk *C) { return C->Name == S.Name; });
    if (std::any_of(It, PH.Chunks.end(), [](ELFYAML::Chunk *C) {
          return (isa<ELFYAML::Fill>(C) ||
                  cast<ELFYAML::Section>(C)->Type != ELF::SHT_NOBITS);
        }))
      return true;
  }
  return false;
}

If we have a segment with sections:
<regular1>, <nobits1>, <regular2>, <nobits2>

Then we might have a segment description that specifies only <regular1> and <nobits2>:

Sections:
  - Section: <regular1>
  - Section: <nobits2>

and it will be different from the case where we explicitly specify all of sections,
because in the case above I believe we will not allocate a space for the <nobits1> section with the current code.

That is why I tried to implement a requirement about that all sections must explicitly present in the segment description at first.

With the solution that this patch suggests it should be impossible to trigger the possible bug/misbehavior I've described.

grimar updated this revision to Diff 302485.Nov 3 2020, 12:20 AM
grimar retitled this revision from [yaml2obj][WIP] - ProgramHeaders: introduce FirstSec/LastSec instead of Sections list. to [yaml2obj] - ProgramHeaders: introduce FirstSec/LastSec instead of Sections list..
grimar edited the summary of this revision. (Show Details)
  • Updated other tests. Added a few new ones.

Ready for review.

jhenderson added inline comments.Nov 3 2020, 1:36 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
502–508

The behaviour here is a little subtle (relying on the zero-value insertion of a size_t when a member doesn't exist). I wonder if it would be worth adding some comments to make it clearer?

507

This behaviour isn't immediately clear to me as being the behaviour when just FirstSec is specified. It could easily be thought that just FirstSec implies that section and all later sections. What do you think about requiring both keys if either is specified?

512

Maybe worth bailing out at this point if either First or Last are invalid, rahter than needing the if (First && Last) bits here.

llvm/lib/ObjectYAML/ELFYAML.cpp
919
llvm/test/tools/llvm-objdump/X86/elf-dynamic-relocs.test
99

Side-note: this one was previously missing .got.plt!

llvm/test/tools/obj2yaml/ELF/program-headers.yaml
347–348

This comment probably needs updating.

llvm/test/tools/yaml2obj/ELF/custom-fill.yaml
266–271

We probably need the same test case but for the LastSec key.

llvm/test/tools/yaml2obj/ELF/program-header.yaml
121
122
136
154

Should we have a test case showing that Fills are included as well?

grimar updated this revision to Diff 303063.Nov 5 2020, 2:39 AM
grimar marked 11 inline comments as done.
grimar edited the summary of this revision. (Show Details)
  • Addressed review comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
507

It could easily be thought that just FirstSec implies that section and all later sections.

I had such concern too. Was not sure what is better. I've made both keys to be required if either is specified.

llvm/test/tools/obj2yaml/ELF/program-headers.yaml
347–348

I've updated the test itself instead.

jhenderson added inline comments.Nov 5 2020, 4:43 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
488–500
llvm/test/tools/llvm-readobj/ELF/gnu-notes.test
192
llvm/test/tools/llvm-readobj/ELF/hash-histogram.test
412
llvm/test/tools/obj2yaml/ELF/program-headers.yaml
403–405

We probably need a case where .non-alloc.2 would be the LastSec too.

llvm/tools/obj2yaml/elf2yaml.cpp
392–399

Does this loop work if section headers are not in section offset order? Or are they guaranteed to be by some sorting earlier on?

grimar marked an inline comment as done.Nov 6 2020, 1:09 AM
grimar added inline comments.
llvm/tools/obj2yaml/elf2yaml.cpp
392–399

I have no short answer, we have a specific behavior for cases where sections are not in the
section offset order currently, let me describe it.

Imagine we have 3 sections:
.foo at offset 0x100, .bar at offset 0x200 and .zed at offset 0x300.
And lets have a section header table where they are in a different order:
.zed (0x300), .foo (0x100) and .bar (0x200).

And lets have a segment [.foo, .zed]. The YAML for this case is shown below:

--- !ELF
FileHeader:
  Class: ELFCLASS64
  Data:  ELFDATA2LSB
  Type:  ET_EXEC
Sections:
  - Name:    .foo
    Type:    SHT_PROGBITS
    Flags:   [ SHF_ALLOC ]
    Size:    0x1
    Offset:  0x100
  - Name:    .bar
    Type:    SHT_PROGBITS
    Flags:   [ SHF_ALLOC ]
    Size:    0x2
    Offset:  0x200
  - Name:    .zed
    Type:    SHT_PROGBITS
    Flags:   [ SHF_ALLOC ]
    Size:    0x3
    Offset:  0x300
SectionHeaderTable:
  Sections:
    - Name: .zed
    - Name: .foo
    - Name: .bar
    - Name: .strtab
    - Name: .shstrtab
ProgramHeaders:
  - Type:     PT_LOAD
    Flags:    [ PF_W, PF_R ]
    VAddr:    0x1000
    FirstSec: .foo
    LastSec:  .zed

after applying obj2yaml we have:

--- !ELF
FileHeader:
  Class:           ELFCLASS64
  Data:            ELFDATA2LSB
  Type:            ET_EXEC
ProgramHeaders:
  - Type:            PT_LOAD
    Flags:           [ PF_W, PF_R ]
    FirstSec:        .zed
    LastSec:         .bar
    VAddr:           0x0000000000001000
Sections:
  - Name:            .zed
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC ]
    Address:         0x0000000000000003
    Content:         '000000'
  - Name:            .foo
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC ]
    Content:         '00'
  - Name:            .bar
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC ]
    Address:         0x0000000000000001
    Content:         '0000'
...

What is noticable:

  1. We lost the information about the section header order. I.e. now the order is not .foo .bar. zed, but .zed .foo .bar. Because we dump sections by section header table order only and don't try to preserve section offsets.
  2. FirstSec/LastSec were changed, but they match the YAML produced.
jhenderson added inline comments.Nov 6 2020, 1:32 AM
llvm/tools/obj2yaml/elf2yaml.cpp
392–399

My opinion is that we don't need to preserve in the YAML the section header table order, as long as the semantic meaning is identical. Section offset order likely needs maintaining if the sections are inside segments, or any hard-coded addresses will end up breaking, right? It sounds like that might be an issue already though?

grimar updated this revision to Diff 303371.Nov 6 2020, 1:43 AM
grimar marked 5 inline comments as done.
  • Addressed review comments.
llvm/tools/obj2yaml/elf2yaml.cpp
392–399

Yeah, I think so.

I guess we want do dump sections as we do now, but then sort them by file offsets (i.e. sort Chunks) and also emit the Offset field (perhaps not always, but when the aligned [previous offset + size of previous section] value doesn't match the offset of a section dumped).

Though I'd like to experiment with this are probably at first to see if there is something not obvious that I may be missing.

jhenderson accepted this revision.Nov 6 2020, 2:02 AM

I think this patch looks good. The section offset ordering thing is probably an independent change.

This revision is now accepted and ready to land.Nov 6 2020, 2:02 AM
MaskRay accepted this revision.Nov 6 2020, 8:43 AM