This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Introduce the "Offset" property for sections.
ClosedPublic

Authored by grimar on Apr 27 2020, 7:02 AM.

Details

Summary

Currently there is no good way to set a physical offset for a section:

  • We have the ShOffset that allows to override the sh_offset, but it does not affect the real data written.
  • We can use a Filler to create an artificial gap, but it is more like a hack rather than a proper solution for this problem.

This patch adds the Offset property which allows setting physical
offsets for sections.

It also generalizes the code, so that we set sh_offset field in one place.

Diff Detail

Event Timeline

grimar created this revision.Apr 27 2020, 7:02 AM
Herald added a project: Restricted Project. · View Herald Transcript
jhenderson added inline comments.Apr 29 2020, 12:51 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
302–303

This comment looks stale to me. It doesn't even read right.

451–452

I think this comment needs to make clear that the code block it accompanies sets the offset of all sections. How about "Set the offset for all sections, except the SHN_UNDEF section with index 0 when not explicitly requested."?

453

Perhaps IsFirstUndefSection would be a bit clearer.

485

"The SHT_NOBITS section does" -> "SHT_NOBITS sections do not"

1027–1030

Should we really be reporting this error? I could imagine people wanting to create broken ELFs with a section at a specific offset and an alignment that doesn't match up.

(Also "requested alignment" is more correct - adjectives generally come before their nouns)

1035–1036

I can see why you've done this, but I think there's also a use-case for a section header table where the sections aren't in offset order, which I believe would run into this problem. You could use the SHOffset field instead, but that has its own problems (doesn't place data at the specified location, shouldn't be included in program header calculations etc).

llvm/test/tools/yaml2obj/ELF/section-offset.yaml
9

I'm not sure there's much point in this check?

39

I'm not sure this size check adds anything. The fact that the .strtab section's offset also increases is surely sufficient?

69–70

Replace with cmp %t1 %t3?

90

Rather than checking the file size, I think a clearer thing to do would be to check the offset of the section header table to show that it is/isn't impacted by the section's offset.

grimar updated this revision to Diff 260866.Apr 29 2020, 2:19 AM
grimar marked 14 inline comments as done.
  • Addressed review comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1027–1030

I wasn't sure about this error, but using both "AddressAlign" and "Offset" in a YAML probably might lead to confusion:

Sections:
  - Name:         .bar
    Type:         SHT_PROGBITS
    Size:         0x10
    Offset:       0x100
    AddressAlign: 0x1000

Usually AddressAlign affects the offset. I.e. it is not a regular field that just sets a value.
And so it might raise a question: should an Offset be aligned by AddressAlign? Or should we use the Offset given and just ignore AddressAligns usual logic?

I am not sure we want to silently allow this combination (when an alignment doesn't match up).

Currently there is a way to achieve what you describe: it should be possible to use Filler(s) + ShOffset + AddressAlign to create any layout.
Perhaps it is sufficient?

1035–1036

yaml2obj does not support writing section out of order currently. This patch has no intention to change this.
We always write sections in the same order they are listed in a YAML.

To implement what you want perhaps we might want just to introduce another field, like "SectionHeaderIndex".
Or may be expand YAML format to it allow to describe section header table (and may be it's location) explicitly.

llvm/test/tools/yaml2obj/ELF/section-offset.yaml
9

Removed.

39

The fact that the .strtab section's offset also increases is surely sufficient?

If we assume that is not broken/overridden, then - yes.

MaskRay added inline comments.Apr 29 2020, 9:30 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
1027–1030

For SHT_NOBITS sections, sh_offset%sh_addralign!=0 is possible (sh_offset for SHT_NOBITS can generally be ignored). lld can create such output (ELF/Writer.cpp:computeFileOffset).

1035–1036

alignToOffset(sec) can be called according to the section offset order in Doc.Chunks in ELFState<ELFT>::initSectionHeaders may be sufficient, but I haven't tried.

grimar marked an inline comment as done.Apr 29 2020, 10:39 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1035–1036

alignToOffset(sec) can be called according to the section offset order in Doc.Chunks in ELFState<ELFT>::initSectionHeaders may be sufficient.

Not sure I understand what you suggest? Calling of alignToOffset in initSectionHeaders is what this patch does already.

jhenderson added inline comments.Apr 30 2020, 1:37 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
303–306

section header -> section header table (or section headers)

The ELF header?

I thought someone made a change a while back to write the section header table at the end anyway? Or am I getting mixed up with a different thing?

485

Missed a bit:

section -> sections

1027–1030

I am not sure we want to silently allow this combination (when an alignment doesn't match up).

I'd probably suggest ignoring the alignment in this situation, since the Offset is more explicit - if I specified an Offset that was deliberately not aligned, I'd probably still expect it to get respected, and the alignment field just used to set the sh_addralign value.

You could add a warning if you wanted, but since those aren't likely to be seen, I'm not sure how useful that is.

llvm/test/tools/yaml2obj/ELF/section-offset.yaml
69–70

This referred to the llvm-readelf line. We don't need to do llvm-readelf twice, if the two outputs should be binary identical. Use cmp instead.

111

in Case 1.

grimar updated this revision to Diff 261213.Apr 30 2020, 6:47 AM
grimar marked 8 inline comments as done.
  • Addressed review comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
303–306

No, you are right, it is written to the end. I've broke this comment during rewriting somehow :/ Fixed.

1027–1030

OK. Done.

MaskRay accepted this revision.Apr 30 2020, 9:57 AM
MaskRay added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1035–1036

Sorry for the confusion. I mean (in a future change), we may change the order ELFState<ELFT>::alignToOffset is called on each section. We call alignToOffset following the order of section offsets. This may allow out-of-order sh_offset fields in the section header table.

https://reviews.llvm.org/D78422#1991831 says

FWIW in embedded systems it is quite common to have linker scripts where different OutputSections need to be copied to various disjoint memory locations and these are often not in order of ascending address.

This revision is now accepted and ready to land.Apr 30 2020, 9:57 AM
grimar marked an inline comment as done.May 1 2020, 1:26 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1035–1036

What I do not like in idea of allowing disordered offsets is:

  1. It might open a worm can: it will be easy to create overlapping sections or something alike. We then might want the code to diagnose it,

but it sounds a bit too complex and probably not what we want.

  1. Out logic of creating program headers assumes that chunks are ordered by offsets. Changing this assumption will introduce an additional complexity there.

I am thinking about the possible extending the YAML instead. E.g. we could have something like:

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_REL
  Machine: EM_X86_64
  SectionHeader:
    Location: <location> ## before sections/after sections
    Sections:
      - .foo
      - .bar
Sections:
  - Name:  .foo
    Type:  SHT_PROGBITS
    Size:  0x8
...

The benefit is that it is more flexible and should not touch the rest of the code.

grimar marked an inline comment as done.May 1 2020, 1:29 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1035–1036

SectionHeader in my sample should not be a part of FileHeader probably. E.g:

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_REL
  Machine: EM_X86_64
Sections:
  - Name:  .foo
    Type:  SHT_PROGBITS
    Size:  0x8
...
SectionHeader:
    Location: <location> ## before sections/after sections
    Sections:
      - .foo
      - .bar

Ping. @jhenderson, are you OK to land it?

Ping. @jhenderson, are you OK to land it?

Huh. This somehow fell through the cracks. I'll look at it tomorrow.

jhenderson added inline comments.May 13 2020, 12:26 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
221

They're not going to be that common, but sh_addralign and p_align are both 64-bit values for ELF64, so Align should be uint64_t here.

453

Nit: There's a double space after the variable name.

1032

Maybe worth a comment explicitly saying we ignore alignment when an explicit offset has been requested, just to show that it is 100% intentional.

1035–1036

Sounds reasonable, though I'd suggest a couple of minor changes: 1) Call it SectionHeaders or SectionHeaderTable; 2) Allow it to take an explicit Offset maybe, possibly instead of Location, any maybe even add special values for Offset like AutoBefore, and AutoAfter or something like that.

llvm/test/tools/yaml2obj/ELF/custom-null-section.yaml
205–206

The offset of a SHT_NULL section probably shouldn't affect that of other sections, since it is supposed to be "inactive". Not sure if it should be part of this change, but we should have a test case for it.

207–210

Any particular reason these aren't one test, with two SHT_NULL sections?

llvm/test/tools/yaml2obj/ELF/section-offset.yaml
5

This comment needs updating, since you're checking the section header table offset now.

96

"in Case 1" or "in the first case"

99

the 'Offset' field
the 'ShOffset' field

114

Ditto.

115

only affects the

grimar updated this revision to Diff 263698.May 13 2020, 6:45 AM
grimar marked 13 inline comments as done.
  • Addressed review comments.
llvm/test/tools/yaml2obj/ELF/custom-null-section.yaml
205–206

The offset of a SHT_NULL section probably shouldn't affect that of other sections, since it is supposed to be "inactive"

Not sure I understood this point right, but I see no reason to make an exception for it:
It is special and that is why we "do not write any content for special SHN_UNDEF section". We are currently able to override its sh_size, sh_info and sh_entsize though. But the offset is not a property of a section. I.e. it has no field like sh_offset.
If a person adds Offset: xx in a YAML for any section, it should have an effect and following sections should have offsets greater or equal to it anyways (unless ShOffset is used to override it).

207–210

yaml2obj normally creates a SHT_NULL section at index 0 implicitly:

// Insert SHT_NULL section implicitly when it is not defined in YAML.
if (Sections.empty() || Sections.front()->Type != ELF::SHT_NULL)
  Doc.Chunks.insert(
      Doc.Chunks.begin(),
      std::make_unique<ELFYAML::Section>(
          ELFYAML::Chunk::ChunkKind::RawContent, /*IsImplicit=*/true));

This patch adds the following code:

if (!IsFirstUndefSection || Sec->Offset)
  SHeader.sh_offset = alignToOffset(CBA, SHeader.sh_addralign, Sec->Offset);

I.e. the first SHT_NULL section is special, and I've added the code to handle the "Offset" properly.
Others SHT_NULL are not special. And I am testing here that
setting an Offset for such sections affects the start of the section headers. For that I need to set
different offsets for it and test the result.

llvm/test/tools/yaml2obj/ELF/section-offset.yaml
5

Actually what I think we want to show is that the Offset key might affect the file size.
It is the main difference with the ShOffset property which never can affect it.
Currently (since you asked to remove the file size test previously) it is tested implicitly:
the Offset changes physical offsets of the following sections, but still worth to mention I think.

Changing of the section header table offset is a size effect. If it was placed before sections and not after,
it would not change.

jhenderson accepted this revision.May 14 2020, 1:19 AM

LGTM, with one remaining suggestion.

llvm/test/tools/yaml2obj/ELF/section-offset.yaml
5

I guess more generally, changing the Offset field changes the layout of the file. Consequently, anything appearing after the section will have a higher offset than it might otherwise do (including sections, the section header table, and the end of the file). Perhaps we should update the comment to say that:

"Show that it can affect the layout of the rest of the file."

This revision was automatically updated to reflect the committed changes.
grimar marked an inline comment as done.May 15 2020, 2:34 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
48

FTR, after commiting this patch I faced LLD test failture and fixed it:
https://github.com/llvm/llvm-project/commit/969c63a2ecfb536062ff2174645abe31e4036067
(it started to report "section [index 1] has a sh_offset (0x1000000000000000) + sh_size (0x4) that is greater than the file size (0x100000120)" instead of "section sh_addralign is too large" LLD's error)

The issue was related to "AddressAlign: 0x1000000000000000" line in the YAML:

- Name:            .text
  Type:            SHT_PROGBITS
  Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
  AddressAlign:    0x1000000000000000
  Content:         "00000000"

Previously, we had this piece of code which used unsigned, so AddressAlign was truncated.
As a result a little binary with sh_addrallign==0x1000000000000000 was produced.
After this patch, this piece does not exist and currently for that test a 500mb binary with with sh_addrallign==0x1000000000000000 is produced. I.e. it works a bit more correct, but
the binary is is still truncated.

It happens because of ELFState<ELFT>::alignToOffset, which has the following lines:

CBA.getOS().write_zeros(AlignedOffset - CurrentOffset);
return AlignedOffset;

AlignedOffset and CurrectOffset are uint64_t, but raw_ostream::write_zeros takes unsigned:

/// indent - Insert 'NumSpaces' spaces.
raw_ostream &raw_ostream::indent(unsigned NumSpaces) {
  return write_padding<' '>(*this, NumSpaces);
}

/// write_zeros - Insert 'NumZeros' nulls.
raw_ostream &raw_ostream::write_zeros(unsigned NumZeros) {
  return write_padding<'\0'>(*this, NumZeros);
}

Probably we want to change it to take uint64_t.
I do not think there is a way to write a test for it though as the result output would be huge.

grimar marked an inline comment as done.May 15 2020, 2:38 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
48

Ind I do not believe someone really needs uint64_t there to put so many zeroes, so I am not sure if we want to change it for "correctness" or not.

jhenderson added inline comments.May 15 2020, 4:06 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
48

You're probably right that nobody is likely to need to write that many zeroes at once, but should we have something instead that prevents a silent truncation? Fixing the issue would of course be one way of doing it.

grimar marked 2 inline comments as done.May 15 2020, 5:16 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
48

What if we limit the size of the object produced by yaml2obj to something? E.g 10mb. And introduce an option to change this limit.
Since it is used often for writing test cases, the outputs produced ideally should be small. And such limit would also allow to catch this and possible other mistakes that might lead to huge binaries. (I am thinking about the Offset field which might cause the same).

And I guess we can change write_zeros to take uint64_t, but error out when NumZeros>UINT32_MAX there too.

jhenderson added inline comments.May 18 2020, 1:20 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
48

I have nothing against a file size limit in yaml2obj, as long as it doesn't significantly increase complexity. It's probably a good safety net.

I'll leave it up to you what to do about write_zeros. I don't have a strong opinion on it.