This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] - Dump section offsets in some cases.
ClosedPublic

Authored by grimar on Nov 10 2020, 5:05 AM.

Details

Summary

Currently we never dump the sh_offset key.
Though it sometimes an important information.

To reduce the noise this patch implements the following logic:

  1. The "Offset" key for the first section is always emitted.
  2. If we can derive the offset for a next section naturally, then the "Offset" key is omitted.

By "naturally" I mean that section[X] offset is expected to be:

offsetOf(section[X]) == alignTo(section[X - 1].sh_offset + section[X - 1].sh_size, section[X].sh_addralign)

So, when it has the expected value, we omit it from the output.

Diff Detail

Event Timeline

grimar created this revision.Nov 10 2020, 5:05 AM
grimar requested review of this revision.Nov 10 2020, 5:05 AM
grimar updated this revision to Diff 304156.Nov 10 2020, 5:36 AM
  • Cosmetic fix.
jhenderson requested changes to this revision.Nov 10 2020, 5:37 AM

I don't think you want to set the offset unconditionally for the first section either. Much of the time the first section's position will be automatically derivable, namely immediately after the ELF header and potential program headers (after alignment has been taken into account). If we explicitly print the offset, people are going to start including it unnecessarily.

This revision now requires changes to proceed.Nov 10 2020, 5:37 AM
grimar updated this revision to Diff 304455.Nov 11 2020, 3:39 AM
  • Addressed review comment (don't dump the offset of the first section).
grimar updated this revision to Diff 304460.Nov 11 2020, 3:42 AM
  • Fixed test case.
jhenderson added a comment.EditedNov 11 2020, 3:58 AM

Mostly looks fine. Do NOBITS sections come into play here at all? In particular, their size wouldn't normally impact the offset of subsequent sections.

llvm/test/tools/obj2yaml/ELF/offset.yaml
87
100
grimar updated this revision to Diff 304483.Nov 11 2020, 4:54 AM
grimar marked an inline comment as done.
  • Addressed review comments.

Mostly looks fine. Do NOBITS sections come into play here at all? In particular, their size wouldn't normally impact the offset of subsequent sections.

I've updated the patch and added the test case for SHT_NOBITS sections. Thanks!

Doesn't the behaviour of a SHT_NOBITS section depend on whether it is a non-last member of a segment? IIRC, SHT_NOBITS sections in a segment which are last do not need file space, but to ensure correct addresses for sections appearing after them in a segment, they are allocated filespace by a linker if non-NOBITS appear after them in the same segment. (The behaviour you're doing here looks fine for when such sections are not inside segments, e.g. for ET_REL objects).

Doesn't the behaviour of a SHT_NOBITS section depend on whether it is a non-last member of a segment? IIRC, SHT_NOBITS sections in a segment which are last do not need file space, but to ensure correct addresses for sections appearing after them in a segment, they are allocated filespace by a linker if non-NOBITS appear after them in the same segment. (The behaviour you're doing here looks fine for when such sections are not inside segments, e.g. for ET_REL objects).

True. But how much necessary to support this case now? The problem is that we at first dump sections and then dump program headers.
To implement the precise behavior, I think we need to delay assigning offsets to be able to call shouldAllocateFileSpace, which does the job:

static bool shouldAllocateFileSpace(ArrayRef<ELFYAML::ProgramHeader> Phdrs,
                                    const ELFYAML::NoBitsSection &S) {
  for (const ELFYAML::ProgramHeader &PH : Phdrs) {
    auto It = llvm::find_if(
        PH.Chunks, [&](ELFYAML::Chunk *C) { return C->Name == S.Name; });
    if (std::any_of(It, PH.Chunks.end(), [](ELFYAML::Chunk *C) {
          return (isa<ELFYAML::Fill>(C) ||
                  cast<ELFYAML::Section>(C)->Type != ELF::SHT_NOBITS);
        }))
      return true;
  }
  return false;
}

I wonder - do we want adding this complexity? I thought that nobits sections are usually placed after progbits anyways.

Doesn't the behaviour of a SHT_NOBITS section depend on whether it is a non-last member of a segment? IIRC, SHT_NOBITS sections in a segment which are last do not need file space, but to ensure correct addresses for sections appearing after them in a segment, they are allocated filespace by a linker if non-NOBITS appear after them in the same segment. (The behaviour you're doing here looks fine for when such sections are not inside segments, e.g. for ET_REL objects).

True. But how much necessary to support this case now? The problem is that we at first dump sections and then dump program headers.
To implement the precise behavior, I think we need to delay assigning offsets to be able to call shouldAllocateFileSpace, which does the job:

In dumpProgramHeaders we call isInSegment to know the parent segment of a section. If we move the logic earlier before setOffsets, we can refine the computation of the expected sh_offset.
With a containing PT_LOAD, sh_offset should equal its address modulo p_align.

static bool shouldAllocateFileSpace(ArrayRef<ELFYAML::ProgramHeader> Phdrs,
                                    const ELFYAML::NoBitsSection &S) {
  for (const ELFYAML::ProgramHeader &PH : Phdrs) {
    auto It = llvm::find_if(
        PH.Chunks, [&](ELFYAML::Chunk *C) { return C->Name == S.Name; });
    if (std::any_of(It, PH.Chunks.end(), [](ELFYAML::Chunk *C) {
          return (isa<ELFYAML::Fill>(C) ||
                  cast<ELFYAML::Section>(C)->Type != ELF::SHT_NOBITS);
        }))
      return true;
  }
  return false;
}

I wonder - do we want adding this complexity? I thought that nobits sections are usually placed after progbits anyways.

I mostly want to make sure we don't result in invalid output caused by missing the corner case. If all that happens is that Offset is produced unnecessarily, it's probably fine either way, but what I don't want is Offset to be omitted due to an incorrect calculation, when it shouldn't be, as this will lead to an invalid output, which might mask a problem a user is trying to locate. As things stand, I think we can run into that situation.

llvm/tools/obj2yaml/elf2yaml.cpp
518–521

By the way, I think it feels more natural to do the special case first (i.e. Sec.Type == ELF::SHT_NOBITS), before the default case, as it is easier to add further cases in the future, but it's not a big deal either way.

grimar added a comment.EditedNov 16 2020, 6:14 AM

Doesn't the behaviour of a SHT_NOBITS section depend on whether it is a non-last member of a segment? IIRC, SHT_NOBITS sections in a segment which are last do not need file space, but to ensure correct addresses for sections appearing after them in a segment, they are allocated filespace by a linker if non-NOBITS appear after them in the same segment. (The behaviour you're doing here looks fine for when such sections are not inside segments, e.g. for ET_REL objects).

True. But how much necessary to support this case now? The problem is that we at first dump sections and then dump program headers.
To implement the precise behavior, I think we need to delay assigning offsets to be able to call shouldAllocateFileSpace, which does the job:

In dumpProgramHeaders we call isInSegment to know the parent segment of a section. If we move the logic earlier before setOffsets, we can refine the computation of the expected sh_offset.
With a containing PT_LOAD, sh_offset should equal its address modulo p_align.

This should work, but it introduces the depencency on the VA. Perhaps I'd try to avoid adding more dependencies. I don't have a strong opinion though,
but I have a feeling that just delaying of the offsets assigning might look a bit simpler (or at least not that bad as it sounds). Going to try to see how it will look like.

grimar updated this revision to Diff 305709.EditedNov 17 2020, 2:39 AM
  • Updated implementation to delay dumping of offsets to fix the logic for SHT_NOBITS sections.
jhenderson added inline comments.Nov 19 2020, 1:18 AM
llvm/test/Object/obj2yaml.test
365

Just trying to understand why this and the .data Offsets are needed explicitly here, when they weren't prior to this patch?

grimar added inline comments.Nov 19 2020, 1:47 AM
llvm/test/Object/obj2yaml.test
365
  1. Before this patch we never dumped the "Offset" field.
  2. This is the case when we unable to properly derive the offset of the .rel.text and .data because they

are not sorted by the file offset in the section header table:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 00004c 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 000434 000030 08     10   1  4
  [ 3] .data             PROGBITS        00000000 000080 000000 00  WA  0   0  4
  [ 4] .bss              NOBITS          00000000 000080 000004 00  WA  0   0  4
  [ 5] .mdebug.abi32     PROGBITS        00000000 000080 000000 00      0   0  1
jhenderson accepted this revision.Nov 19 2020, 1:54 AM

LGTM, thanks. Might want to wait for @MaskRay too.

This revision is now accepted and ready to land.Nov 19 2020, 1:54 AM
MaskRay accepted this revision.Nov 25 2020, 12:20 AM
This revision was automatically updated to reflect the committed changes.