This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj/obj2yaml] - Implement program header table as a special Chunk.
AcceptedPublic

Authored by grimar on Jan 27 2021, 10:57 PM.

Details

Summary

This implements the program header table as a chunk of type "ProgramHeaderTable".
It allows to place the program header table at an arbitrary location.

Diff Detail

Event Timeline

grimar created this revision.Jan 27 2021, 10:57 PM
grimar requested review of this revision.Jan 27 2021, 10:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2021, 10:57 PM

I might well have missed it, but I don't recall seeing any dedicated obj2yaml tests?

llvm/lib/ObjectYAML/ELFEmitter.cpp
757

My reading of the ELF gABI is that this will need to be aligned on a 4-byte/8-byte boundary dependent on ELF32/ELF64. We should do that for an implicit offset (but allow it if the offset is explicit):

All data structures that the object file format defines follow the ``natural'' size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 8-byte alignment for 8-byte objects, 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of 4 or 8, and so forth. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file.

Speaking of which, this code looks quite different to the section header table code, which has two separate calls to alignToOffset dependent on whether the Offset has been specified, if I read it correctly. Why is that?

759

(sorry - missed this same issue in the section header table case)

llvm/test/tools/yaml2obj/ELF/program-header.yaml
176
203–204

What about an offset that isn't already aligned to 4/8?

218–219

Should we have one for after all other chunks so that the total file size is based on the end of the last chunk? I don't know if that's really needed or not.

I might well have missed it, but I don't recall seeing any dedicated obj2yaml tests?

There is a obj2yaml/ELF/program-headers.yaml test which checks how the
program header table is emitted. But there is no much else we have to test currently I think,
because at this moment obj2yaml just emits the description in the begining of sections list.
It can't dump the table on a proper position or dump the "Offset" key.
I.e. this is already covered by the test. I am going to improve it slightly though (e.g. will add a comment).

grimar updated this revision to Diff 319826.Jan 28 2021, 4:39 AM
grimar marked 5 inline comments as done.
  • Addressed review comments.
  • Improved testing.
llvm/lib/ObjectYAML/ELFEmitter.cpp
757

Why is that?

Just because the code did not do the auto alignment. I wasn't sure if we want it as usually the program header
table is placed at the begining, where it is naturally aligned, though your reference to gABI makes sense to me. Also, the code is indeed better to be
consistent with the code for the section header table. So, I've fixed it.

llvm/test/tools/yaml2obj/ELF/program-header.yaml
203–204

Done.

218–219

I've improved testing and added such a test case.

jhenderson accepted this revision.Jan 28 2021, 5:27 AM

LGTM, with one nit.

llvm/test/tools/yaml2obj/ELF/program-header.yaml
357–358

Copy paste error? The file size is 336 in this case.

This revision is now accepted and ready to land.Jan 28 2021, 5:27 AM
grimar marked an inline comment as done.Jan 28 2021, 5:27 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/program-header.yaml
357–358

Oh, right, thanks!

MaskRay requested changes to this revision.EditedJan 28 2021, 8:26 PM

This change will make obj2yaml crash on a binary processed by llvm-strip --remove-sections (which is currently untested).

Program headers are not subsidiary of the section header table. ProgramHeaderType is fine, but it probably should not have Segments:. The program headers can still be described the current ProgramHeaders:.

This revision now requires changes to proceed.Jan 28 2021, 8:26 PM
grimar marked an inline comment as done.Jan 28 2021, 11:56 PM

Program headers are not subsidiary of the section header table. ProgramHeaderType is fine, but it probably should not have Segments:. The program headers can still be described the current ProgramHeaders:.

This was already discussed.
We can rename the "Sections:" to "Layout:". Or allow both. See https://reviews.llvm.org/D93678#2492777.
Having Segments: in ProgramHeader Type is consistent with SectionHeaderTable type.

grimar updated this revision to Diff 320074.Jan 29 2021, 1:50 AM
  • Fix the obj2yaml crash on a binary processed by llvm-strip --remove-sections. The issue happened when there was no section header table. I've fixed it and added a test case to program-headers.yaml. Note that currently obj2yaml doesn't emit a SectionHeaderTable chunk with NoHeaders = true for such a case. But this issue is unrelated to this patch, we already have the following FIXME in the eshnum.yaml test:
## In the test case below we have an object without a section header table and e_shnum == 0.
## Document how we dump it.
## FIXME: we should emit the `SectionHeaderTable` key with `NoHeaders=true` for this case.
jhenderson added inline comments.Jan 29 2021, 3:57 AM
llvm/test/tools/yaml2obj/ELF/program-header.yaml
357–358

Looks like this still hasn't been addressed?

grimar updated this revision to Diff 320100.Jan 29 2021, 4:05 AM
grimar marked an inline comment as done.
  • Addressed review comment.
llvm/test/tools/yaml2obj/ELF/program-header.yaml
357–358

My apoligies. Seems I've pushed "Done" by mistake yesterday.

jhenderson accepted this revision.Jan 29 2021, 4:31 AM

LGTM again, but this probably needs @MaskRay's approval too, since he was concerned about the user-facing design.

MaskRay added a subscriber: labath.Jan 29 2021, 4:09 PM

If the idea is that we will have ability to create two program header tables, this seems fine. Such an ability has no use though because only the one referenced by e_phoff matters.
So I am still on the fence whether there is a good syntax.

+ @labath who added some ProgramHeaders: yaml2obj tests to lldb/test/Shell.

If the idea is that we will have ability to create two program header tables, this seems fine. Such an ability has no use though because only the one referenced by e_phoff matters.
So I am still on the fence whether there is a good syntax.

+ @labath who added some ProgramHeaders: yaml2obj tests to lldb/test/Shell.

Overall, I like the ability to explicitly control the location of program headers.

As for @MaskRay's question, I don't have any real answers, but I am going to make an observation. :)

The way I see it, the root cause of the problem here is that the top item item is called "Sections". The name kind of stopped being correct back when we allowed non-section filler chunks. Now it will be even less correct. If you follow this up with (I don't know if its in the works, but it would totally make sense to me) a change to control the placement/contents of section headers, it will become amusingly (confusingly?) self-referential.

If, instead, the top level item was called "Chunks", or "Layout", or "Contents", then it would not be weird to see program (or section) headers described inside. OTOH, it would make the (common) case of describing a section longer (by one line) as they would no longer automatically get the "section" kind.

Whether that's worth it... I don't know...

@grimar is on vacation for an extended period, so I'll take over the design discussion from him, since I worked with him on coming up with the design. Below, I summarise the key design ideas, to make sure we're all on the same page.

At some point, I think we want to change Sections: to Layout:, to avoid confusion. Due to the invasiveness of this change, I think we wanted to get the member design done first, and possibly even keep Sections: as an alias.

This block is now used to control (with this patch) placement of 4 different kinds of things (called "chunks" in the code): sections, fills, the section header table, and the program header table. These would be distinguished by their Type field. Fill, SectionHeaderTable and ProgramHeaderTable have special meanings. All others represent some kind of section. The Type field then drives the meaning of the rest of the chunk. All four types share the same Offset behaviour (it is automatically calculated unless explicitly specified). This means we can control placement of all components of an ELF object, except for the file header itself, which needs to be at a fixed location anyway.

As we all know, fills are for creating gaps in the output, and sections are for defining the section header properties and contents of a section. The ProgramHeaderTable chunk simply describes the program headers in the ELF, in the same way as the old ProgramHeaders key functioned before. Finally, the SectionHeaderTable chunk type is for describing that table. In particular, it provides the ability to control the order of the section headers, as well as whether some or all headers should be omitted from the table. The latter feature is useful for writing arbitrary contents in section form, without needing the data visible to tools via the section header. Note that the section header order is independent of the section data order. The Sections: ordering provides the data ordering. Thus a section header could appear last in the table, but the data might be first in the ELF (after the file header).

If the idea is that we will have ability to create two program header tables, this seems fine. Such an ability has no use though because only the one referenced by e_phoff matters.
So I am still on the fence whether there is a good syntax.

The idea is to allow control over where the program header table is in the ELF object. I don't think the goal is to allow multiple tables. However, should a need arise to do this or something similar ever, the design is easily extensible, which is the reason we actually switched to this design from the older design.

The way I see it, the root cause of the problem here is that the top item item is called "Sections". The name kind of stopped being correct back when we allowed non-section filler chunks. Now it will be even less correct. If you follow this up with (I don't know if its in the works, but it would totally make sense to me) a change to control the placement/contents of section headers, it will become amusingly (confusingly?) self-referential.

You may have missed this - the SectionHeaderTable key (which used to be independent, but now is another of the chunk kinds as described above, due to a recent change) was relatively recently added and allows us to describe the section header table. If the key is omitted, the section header table will be implicitly added at the end.

If, instead, the top level item was called "Chunks", or "Layout", or "Contents", then it would not be weird to see program (or section) headers described inside. OTOH, it would make the (common) case of describing a section longer (by one line) as they would no longer automatically get the "section" kind.

As noted above, by using the existing Type field to recognise special chunk kinds, and then with the default behaviour being to interpret the value as some section type, I don't think we need an extra line.

I did consider whether it would be worthwhile to keep the ProgramHeaders tag as an option, to avoid the need to explicitly place it somewhere in the output/update all the tests, but I don't think there is much benefit for this and it causes an increase in code complexity.

MaskRay accepted this revision.Feb 2 2021, 11:52 PM

Thanks for the explanation. Type: ProgramHeaderTable looks good. Layout: instead of Sections: can indeed make things less confusing but that would be a large refactoring on its own. I think making them aliases may not be straightforward to implement to ObjectYAML... The few check-lldb-shell tests still need to be fixed first.

This revision is now accepted and ready to land.Feb 2 2021, 11:52 PM
labath added a comment.Feb 4 2021, 5:48 AM

@grimar is on vacation for an extended period, so I'll take over the design discussion from him, since I worked with him on coming up with the design. Below, I summarise the key design ideas, to make sure we're all on the same page.

At some point, I think we want to change Sections: to Layout:, to avoid confusion. Due to the invasiveness of this change, I think we wanted to get the member design done first, and possibly even keep Sections: as an alias.

This block is now used to control (with this patch) placement of 4 different kinds of things (called "chunks" in the code): sections, fills, the section header table, and the program header table. These would be distinguished by their Type field. Fill, SectionHeaderTable and ProgramHeaderTable have special meanings. All others represent some kind of section. The Type field then drives the meaning of the rest of the chunk. All four types share the same Offset behaviour (it is automatically calculated unless explicitly specified). This means we can control placement of all components of an ELF object, except for the file header itself, which needs to be at a fixed location anyway.

As we all know, fills are for creating gaps in the output, and sections are for defining the section header properties and contents of a section. The ProgramHeaderTable chunk simply describes the program headers in the ELF, in the same way as the old ProgramHeaders key functioned before. Finally, the SectionHeaderTable chunk type is for describing that table. In particular, it provides the ability to control the order of the section headers, as well as whether some or all headers should be omitted from the table. The latter feature is useful for writing arbitrary contents in section form, without needing the data visible to tools via the section header. Note that the section header order is independent of the section data order. The Sections: ordering provides the data ordering. Thus a section header could appear last in the table, but the data might be first in the ELF (after the file header).

Thanks for summarizing this. I haven't been keeping track of recent yaml2obj changes, but all of this looks extremely nice and useful to me. The main functionality I am missing is the ability to (de)serialize core files (no sections, all info is given through program headers and notes), and it sounds like this comes really close to making that possible.

As noted above, by using the existing Type field to recognise special chunk kinds, and then with the default behaviour being to interpret the value as some section type, I don't think we need an extra line.

Interesting. It slightly overloads the meaning of the "Type" field, but I guess it does make the common case simpler.

Thanks for summarizing this. I haven't been keeping track of recent yaml2obj changes, but all of this looks extremely nice and useful to me. The main functionality I am missing is the ability to (de)serialize core files (no sections, all info is given through program headers and notes), and it sounds like this comes really close to making that possible.

Not a problem! To achieve what I think you're suggesting, I think we'd need to have obj2yaml be able to generate Fills itself. I might be wrong, but I don't think it currently does (Fills are supported in the other direction). I guess the way to do this would be to identify gaps between the other blocks and convert them to Fills with the relevant contents (with no fills required if the size and contents would be the default without the fill). Sounds viable.

As noted above, by using the existing Type field to recognise special chunk kinds, and then with the default behaviour being to interpret the value as some section type, I don't think we need an extra line.

Interesting. It slightly overloads the meaning of the "Type" field, but I guess it does make the common case simpler.

One way of looking at it is not to consider sections directly as a kind of chunk, but rather consider each individual type of section as a different chunk kind (e.g. a relocation section, a dynamic section etc). It just so happens that to generate these different chunk kinds, you use SHT_* instead of "RelocationSection" etc.