This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] - Program headers: simplify the computation of p_filesz.
ClosedPublic

Authored by grimar on Apr 22 2020, 5:02 AM.

Details

Summary

Currently we have computations of p_filesz and p_memsz mixed together
with the use of a loop over fragments. After recent changes it is possible to
avoid using a loop for the computation of p_filesz, since we know that fragments
are sorted by their file offsets.

The main benefit of this change is that splits the computation of p_filesz
and p_memsz what is simpler and allows us to fix the computation of the
p_memsz independently (D78005 shows the issue that we have currently).

Also it fixes the bug in program-header-size-offset.yaml.

Depends on D78627.

Diff Detail

Event Timeline

grimar created this revision.Apr 22 2020, 5:02 AM
MaskRay added inline comments.Apr 22 2020, 10:03 AM
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

Why is this changed?

jhenderson added inline comments.Apr 23 2020, 12:46 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
778–780

This is a little subtle, and probably deserves a comment to explain why we pay attention to offset but not size of a single trailing SHT_NOBITS sections.

787–793

Probably tangential to this change, but if I'm reading this right, this will unconditionally cause all non-empty segments to have a memory size, even if there are no allocatable sections in them. This doesn't seem right to me, especially coming from an environment where segments with non-alloc sections (and therefore no address) are quite normal.

grimar marked an inline comment as done.Apr 23 2020, 12:48 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

This patch fixes a bug I beliece: it's

# Program header with 2 SHT_NOBITS sections.
- Type:     0x6abcdef0
  Offset:   0x2004
  Sections:
    - Section: .data
    - Section: .nobits1
    - Section: .nobits2

The layout is:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00     0   0  0
  [ 1] .text             PROGBITS        0000000000000000 001000 000004 00     0   0 4096
  [ 2] .rodata           PROGBITS        0000000000000000 002000 000004 00     0   0 4096
  [ 3] .data             PROGBITS        0000000000000000 002004 000004 00     0   0  0
  [ 4] .nobits1          NOBITS          0000000000000000 002008 000001 00     0   0  0
  [ 5] .nobits2          NOBITS          0000000000000000 002009 000001 00     0   0  0
  [ 6] .strtab           STRTAB          0000000000000000 002008 000001 00     0   0  1
  [ 7] .shstrtab         STRTAB          0000000000000000 002009 000039 00     0   0  1

0x2009 - 0x2004 == 0x5, not 0x4

grimar marked an inline comment as done.Apr 23 2020, 12:53 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
787–793

My intention is to land this patch (for p_filesz) and then I'll be able update and rebase the D78005 to make it focus only on the p_memsz calculation. I'll try to address this comment there.

grimar marked an inline comment as done.Apr 23 2020, 4:00 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
778–780

I was about to place the following comment:

// SHT_NOBITS sections occupy no physical space in a file, we should not
// take their sizes into account when calculating the file size of a
// segment.

But then remembered one thing and started to doubt about the usage of its offset.
It was mentioned in D69192 that the sh_offset of a SHT_NOBITS can be
"larger than the file size with some usage of objcopy".

I was unable to find a proper test case committed for this and I do not think I know a way to achieve it.
(elf-disassemble-bss.test (https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-objdump/X86/elf-disassemble-bss.test) has a non-trailing .bss section that has a too large sh_offset though).

ELF spec says about sh_offset:

"This member's value gives the byte offset from the beginning of the file to
the first byte in the section. One section type, SHT_NOBITS described
below, occupies no space in the file, and its sh_offset member locates
the conceptual placement in the file."

Does "sh_offset member locates the conceptual placement in the file" imply that the
sh_offset have to be less than the file size? I am not sure that "in the file." == "can be outside of the file", it is wierd.

So can we use sh_offset like I do here?

grimar marked an inline comment as done.Apr 23 2020, 4:07 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
778–780

FTR, the approach of this patch it is the same as what LLD does:

template <class ELFT> void Writer<ELFT>::setPhdrs(Partition &part) {
  for (PhdrEntry *p : part.phdrs) {
    OutputSection *first = p->firstSec;
    OutputSection *last = p->lastSec;

    if (first) {
      p->p_filesz = last->offset - first->offset;
      if (last->type != SHT_NOBITS)
        p->p_filesz += last->size;
...
MaskRay added inline comments.Apr 23 2020, 9:11 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
778–780

llvm-objcopy cannot make sh_offset larger than the file size (if it does, I will assuredly consider it a bug). GNU objcopy can do some in some cases. See binutils-gdb/bfd/elf.c:assign_file_positions_for_load_sections the variable off_adjust. A few bug reports are related https://sourceware.org/bugzilla/show_bug.cgi?id=25662

This is basically a size saving optimization but this is rather error-prone. In llvm-objcopy --only-keep-debug, we simply made sh_offset monotonically increasing to avoid such complexity.

llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

I think it is hard to say that this is a bug. Conceptually sh_offset of a SHT_NOBITS section can be ignored. Usually, the ELF writer should set the sh_offset field of .nobits2 to 0x2008 because there is no need to leave a one-byte gap.

I don't think this trivia matter much though, handling it either way is ok. If doing it one way helps simplify our overall logic, let's choose that way.

MaskRay added inline comments.Apr 23 2020, 9:12 AM
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
32

Leave a comment how FileSize is computed.

grimar marked an inline comment as done.Apr 24 2020, 12:46 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
778–780

Thanks for information!

grimar updated this revision to Diff 259826.Apr 24 2020, 1:02 AM
  • Addressed review comments.
jhenderson added inline comments.Apr 24 2020, 1:09 AM
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

Is there a risk that leaving the FileSize of the segment too high might result in it going outside the range of the file? In other words, does yaml2obj pay any attention to the segment sizes when it lays things out? In other words, in this example, if .nobits1 was say size 0xFFFF0000, would it cause the program header to be referencing data outside the file?

Knowing that llvm-objcopy reads segment data based on the segment file size property, I think we need to be careful about what the FileSize is for segments. It cannot go beyond the file's end.

grimar marked an inline comment as done.Apr 24 2020, 1:43 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

if .nobits1 was say size 0xFFFF0000, would it cause the program header to be referencing data outside the file?

It is possible with the use of ShOffset. But since ShOffset is itself a thing for overriding the offset and used mostly for creating invalid object, this probably not a problem?

grimar marked an inline comment as done.Apr 24 2020, 2:12 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

See.

Is there a risk that leaving the FileSize of the segment too high might result in it going outside the range of the file?

The way to achieve it is to use ShOffset, what is probably not an issue. yaml2obj places all sections in order and can'y assign an offset that is outside of the file by itself (without using ShOffset).

In other words, does yaml2obj pay any attention to the segment sizes when it lays things out?

The layout of section is unrelated to segments. First we do layout for sections and then create headers independently,
basing on the sections layout.

In other words, in this example, if .nobits1 was say size 0xFFFF0000, would it cause the program header to be referencing data outside the file?

Let me answer this again. I've read size as offset previously. The size of SHT-NOBITS is not taken into account.
It might be possible to have a regular section with a broken size (with the use of ShSize) and then the segment file size will be broken. But that is about broken objects again.

jhenderson added inline comments.Apr 24 2020, 2:29 AM
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

Oh, I think I see. I agree that Sh* values can be ignored for this in general (and indeed, I think we agreed elsewhere that they shouldn't really impact the layout of the segments anyway).

Just to confirm my understanding, the FileSize increase is because ShOffset of .nobits2 is affecting the size of the segment, and not because the size of .nobits1 is 1?

grimar marked an inline comment as done.Apr 24 2020, 2:47 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/program-header-size-offset.yaml
33–34

Just to confirm my understanding, the FileSize increase is because ShOffset of .nobits2 is affecting the size of the segment, and not because the size of .nobits1 is 1?

Right.

This revision is now accepted and ready to land.Apr 24 2020, 3:51 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2020, 5:54 AM