This is an archive of the discontinued LLVM Phabricator instance.

[elfabi] Write program headers, .dynamic, .dynstr, and .shstrtab
Needs ReviewPublic

Authored by jakehehrlich on Dec 18 2018, 4:30 PM.

Details

Summary

This patch adds the majority of the work needed for writing binary ELF stub files. While only the dynamic entries are written as contents, most of this patch involves the surrounding framework of an ELF file that allows the .dynamic and .dynstr sections to exist happily.

Added:

  • Writes section headers.
  • Writes two program headers (PT_DYNAMIC and a PT_LOAD).
  • Writes .dynstr (only containing strings from .dynamic).
  • Writes .dynamic.
  • Writes .shstrtab.

This patch is very large, and may later be broken down if possible.

Diff Detail

Repository
rL LLVM

Event Timeline

amontanez created this revision.Dec 18 2018, 4:30 PM
amontanez updated this revision to Diff 183016.Jan 22 2019, 6:37 PM

Rebase, split out bigger functions to use helper functions, add a simple test.

amontanez retitled this revision from [elfabi] [WIP] Write .dynamic section with DT_SONAME and DT_NEEDED to [elfabi] Write program headers, .dynamic, .dynstr, and .shstrtab.
amontanez edited the summary of this revision. (Show Details)
amontanez removed a subscriber: jakehehrlich.

Cleaned up significantly, added tests, moved from WIP status to ready for review. This patch is very large. I'm open to splitting it up after a brief discussion on how people would prefer I do that.

amontanez planned changes to this revision.Jan 23 2019, 6:19 PM

Removing from review queues until I look into write commands requested in D55839

jakehehrlich commandeered this revision.Apr 26 2019, 4:59 PM
jakehehrlich edited reviewers, added: amontanez; removed: jakehehrlich.

Gonna take this one too.

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2019, 4:59 PM
jakehehrlich added a reviewer: plotfi.

Ok so this is the rough idea that I have for doing this. I need to add program headers which turns out to require refactoring this a bit. The general idea takes some explanation. There are multiple ways to think about it however. The goal is for everything to "work itself out" automagically. I'll need to add program headers as many bits of functionality are not easily testable here unless we add them. You'll note that I put the NOBITS sections before the other sections. This doesn't jive well with using a single program header to cover them all and since space is a thing we'd like to constrain here we should support it. Unfortunately I did that as optimization as a means of avoiding the dynamic symbol table needing know the addresses of the NOBITS sections before starting. This is overcomeable using an additional loop however. I think an additional loop would be needed even if you hand rolled however and by adding this loop we can probably avoid maintaining an intermediate vector for the dynamic symbol table. Instead we can compute the overall size first, allocate the FileOutputBuffer, and then write directly into it. I'm not sure the complexity or redundancy is justified by the memory savings and I think the time savings will be pretty minimal.

First Way of thinking about it (As a Build System)

You can think of each Lazy value as being a build step. That build step then references other build steps as dependencies. Before completing this build step it goes on to evaluate the other build steps first. As long as no cyclic dependencies show up it all works out. To ensure that there are no cyclic dependencies we have to methods. Method 1) is the Blackhole boolean which if set to true while a Lazy value is being evaluated so that if it ever gets back to itself it can raise an error. The name Blackhole comes from some literature regarding a different way of thinking about it. Method 2) is to assume the minimum possible at each build step. By assuming the minimum possible at each build step you ensure (generally) that you logically can't have any cycles without the idea as a whole being invalid. This is not completely true for lots of little reasons but the main issue is that assumptions produce faster code. Right now I have the number of loops over the symbols down to just 2 which is the minimum I think is possible if you were to hand roll this as is.

Second Way of thinking about it (As futures)

You can think of each Lazy value as a promise to compute that at some point. Each future is then resolve as soon as it can be but it might have to wait on another future. Deadlock occurs when there's a cycle. In fact if we used std::future with delayed evaluation instead of Lazy this would work and we could process the graph in parallel. The graph is quite small at the moment since individual symbols are not members of the graph however so I don't think that would be useful.

Third Way of thinking about (As an instance of the Lob combinator)

My background before working in systems was in programming language theory. There's something called the Lob combinator which can take a self referential spegetti like data structure and actually produce it with links and all. It has many forms and this is one of them. The Lob combinator would evaluate all of the nodes (a node in such a structure is equivalent to a node in a the build graph from Way 1) but we only really need a few. The Lob combinator, in its truely magical form, only works with lazy values however. So I'm just replicating that here. Lazy values are represented by "thunks" in practice. The idea of making a thunk a "blackhole" while its being evaluated to catch cycles comes from some readings I did on GHC's implementation of Haskell which lets it dynamically catch some forms of infinite loops that don't make progress (but not all less it solve the halting problem).

Fourth Way of thinking about it (As as an instance of a build system a la "Build Systems a la carte")

This is like a kind of crappy (or more restricted if you'd rather) version of what they outline in that paper for a dynamic build system. The Lob combinator and build systems are thus then related.

Oh and this is the big picture. I'll be splitting this up somehow.

You may have put the rationale somewhere but they are not easily searchable. Please allow me to ask why there is a desire to write yet another ELF emitter. Can't we invest more in existing tools like yaml2obj? Is there any inherent limitation that prevents it from being more dynamic?

llvm/llvm/test/tools/llvm-elfabi/binary-write-pheaders.test
18 ↗(On Diff #197498)

{{.*$}} can just be omitted. Many other tests do this.

If you want to use FileCheck --match-full-lines to omit the regex in 4096{{$}}, they have to be kept.

llvm/llvm/test/tools/llvm-elfabi/binary-write-sheaders.test
60 ↗(On Diff #197498)

It'd be great to get rid of .shstrtab and just use .strtab.

Newer llvm (since 2015) no longer produces .shstrtab as a size optimization.

llvm/test/tools/llvm-elfabi/write-elf32be-ehdr.test
6 ↗(On Diff #197498)

Change x86_64 to powerpc, armv7a, etc to make big-endian more realistic.

llvm/tools/llvm-elfabi/ELFObjHandler.cpp
37

Other than having a mutable Blackhole, you can make Func = []() { llvm_unreachable(...); }; while evaluating the thunk.

70

Early return?

72

Value.emplace().

134

T* -> T *

194

What is this section used for?

210

Which architecture uses the section name .tls?

Did you mean .tbss?

jakehehrlich marked 6 inline comments as done.May 1 2019, 1:36 PM

I can't upload a new patch right now but I plan on making all recommended changes, even the ones I didn't reply to.

You may have put the rationale somewhere but they are not easily searchable. Please allow me to ask why there is a desire to write yet another ELF emitter. Can't we invest more in existing tools like yaml2obj? Is there any inherent limitation that prevents it from being more dynamic?

Good question. yaml2obj doesn't do the same thing and doesn't exist as a library. The point of this binary is to be as small as possible while maintaining linkability and compatibility with tools like llvm-readobj and other such tools. There is some talk of taking the code in llvm-objcopy, getting it up to where it could be used as a library, making a number of other tweaks, and then using that as a more general purpose emission library. That would also not do the same thing however. The trouble is that every ELF emitter has been designed with a specific purpose in mind and most were not designed as libraries. Furthermore when you consider what each use case has to compute and then specify independent of a general emission library it's not clear you really save much. You have the MC code, you have lld's code, you have llvm-objcopy, you have yaml2obj, etc... but they all have their quirks and specific use cases. the MC emiter has an interface that would be unweildy and imprecise here. llvm-objcopy doesn't help with setting these addresses, can't construct dynamic symbol tables in this way, and assumes that it has access to an input file from which its making modifications, doesn't support adding dynamic symbols yet, and generally makes it hard (on purpose) to tweak the memory image of a file. yaml2obj is really only designed to produce test files and is IMO not in a position to replace this tool. Recent work has given it program header support, dynamic symbol table support, and made it get in the way of specifying specific offsets and addresses when you know what you're doing but it still doesn't rise to the level of something we could use here. Even if it did rise to the level of something we could use here we would just wind up constructing its internal format using the same amount of code only to have it go and do the same basic thing.

Here's a claim. The ELFTypes library *is* the universal emitter already. Each of those tools are performing a particular computation that the others are not except perhaps yaml2obj which is really just working its way towards a low fidelity textual representation of ELF files. There are some commonly repeated things here between all of these libraries that could stand to abstracted out but its actually not the majority of the code in any of these tools or libraries. Until a serious proposal for a universal emitter comes up we keep rewriting those common parts. Still 90% of what's common is already solved by the ELFTypes library

llvm/llvm/test/tools/llvm-elfabi/binary-write-pheaders.test
18 ↗(On Diff #197498)

Yep. Don't worry about the tests for now. I'll need to update them and they're going to change drastically.

llvm/llvm/test/tools/llvm-elfabi/binary-write-sheaders.test
60 ↗(On Diff #197498)

That's actually what this patch does. These tests are inherited from the previous version of this patch and have not been modified to work yet. After I remove the changes planned part the tests will work. We'll actually need even more tests than what I have here to properly test this since the tests at hand only check a few sections and .dynamic contents.

llvm/tools/llvm-elfabi/ELFObjHandler.cpp
37

Good idea! I think that's actually what GHC does. I'll do that.

134

Yeah there are a bunch of formatting issues. I'll run clang-format on these when I upload next.

194

It's the section in which all defined non-TLS sections are defined. It's synthesized as a NOBITS section to avoid using any space.

210

.tbss would imply that this is writable when it isn't. In practice read-only data is already thread safe so there is no need for a read only TLS section. There is also in general no reason to have a read-only NOBITS section since the content is just zeros. This means there aren't good names to use for what I'm doing here. These section names don't matter functionally however. I'm open to many other names. I just went for short names where a name used in the wild didn't exist.

ormris added a subscriber: ormris.May 1 2019, 2:21 PM

Ok we're getting closer

  1. Formatted everything
  2. Added suggestions
  3. Program headers are now added
  4. All tests now pass
  5. In theory (but almost certainly not in reality) this should represent a minimal viable product with the exception of support for Alignment

What remains:

  1. Add support for Alignment
  2. Add tests for dynamic symbol table, def size, and tls size
  3. Split up into more manageable chunks so that humans can review this. Any review now is still appreciated of course.
MaskRay added inline comments.May 1 2019, 9:52 PM
llvm/tools/llvm-elfabi/ELFObjHandler.cpp
210

In practice read-only data is already thread safe so there is no need for a read only TLS section.

So why do you need the non-writable .tss?

jakehehrlich marked an inline comment as done.May 2 2019, 1:36 PM
jakehehrlich added inline comments.
llvm/tools/llvm-elfabi/ELFObjHandler.cpp
210

I don't I don't really need it, but I want to minimize size and complexity. The requirements that I found (which might be minimal)

  1. You must have a .dynamic section if you have a soname, or dt_needed. Right now this is added even if no soname or needed libraries exist.
  2. To test with, inspect, and work with a file with a .dynamic section you need a PT_DYNAMIC segment
  3. To have a PT_DYNAMIC segment you need a PT_LOAD which covers it. Permissions on either of these seem not to matter
  4. If you have TLS symbols defined in your module you need a TLS section.
  5. Using a TLS nobits saves space

I have not observed that a TLS segment is needed but I haven't really gotten to testing that out (I will remove it if it isn't needed). I'd like to have only one PT_LOAD that covers everything, not just the PT_DYNAMIC segment (though that is an option). That means .dynstr and .dynsym as well as .dynamic, the TLS section, and the other definition section. For aesthetic reasons I think the permissions on the PT_LOAD should match the permissions on the sections that it covers. Consequently all sections should have the same permissions if we want only one PT_LOAD. The TLS section is arguably not even covered by the PT_LOAD so it might be fair to give it different permissions and call it .tbss but this doesn't give us a proper name for the read only NOBITS section that critically *does* need to be covered by the PT_LOAD segment.

So its an issue of aesthetic tradeoffs (*cough* bikeshed *cough*) in the face of some other motivated issues (having one PT_LOAD and using SHT_NOBITS where possible). There are a few worlds I think exist that try their best to meet the aesthetics work out nicely.

  1. Make the PT_LOAD writable and thus .dynsym and .dynstr as well (note that .dynamic is sometimes read only and sometimes writeable)
  2. Make the PT_LOAD read only and have read-only NOBITS sections in both TLS and non-TLS varities (the current choice) but not have good names and the idea of these existing is otherwise senseless and thus no previous examples of these exist
  3. Make the PT_LOAD read only, keep the non-TLS read-only NOBITS, but make the TLS section writeable so that we can call it ".tbss" in an aesthetically pleasing way.
  4. Make the PT_LOAD read only, keep the sections read-only, and call them ".rodata" and ".tbss" anyhow even if those things each imply (by convention) a false fact about our sections. As an adendum we could choose slightly different names as well.

By choosing new names the aesthetic I violate is that I'm just making up section names (which happens a lot). I think anything else violates something I personally consider more fundamental like consistency, permissions consistency, using standard permissions for the standard sections we do have, etc... I think there's also an argument to be made that using differing section names makes these sorts of binaries easily recognizable.

Is llvm-elfabi used to create interface shared objects? Do you use it to create real(runnable) modules?

If not and if section/segment specification is not required, it can be very simple.
You just need the symbol table and very little other information.
There is one place in lld that checks if the symbol is in a readonly segment (addCopyRelSymbol).
1 PT_LOAD(RWX) + 1 PT_GNU_RELRO works just fine. You don't even need PT_TLS - STT_TLS symbols can be placed in an arbitrary section. Linkers and various binutils don't validate PT_TLS.

Using a TLS nobits saves space

The best layout I can think of is:
PT_LOAD(PT_GNU_RELRO(.data.rel.ro PT_TLS(.tdata | .tbss) .bss.rel.ro) | .data .bss)

Currently lld has (.tbss cannot really be SHT_NOBITS due to .data.rel.ro):
PT_LOAD(PT_GNU_RELRO(PT_TLS(.tdata .tbss) | .data.rel.ro .bss.rel.ro) | .data .bss)

There is much to say on this topic.

(note that .dynamic is sometimes read only and sometimes writeable)

Do you mean ld.lld -z rodynamic (D33251 http://lists.llvm.org/pipermail/llvm-dev/2017-May/113258.html)? (.dynamic is in the PT_GNU_RELRO region)

Is llvm-elfabi used to create interface shared objects? Do you use it to create real(runnable) modules?

If not and if section/segment specification is not required, it can be very simple.
You just need the symbol table and very little other information.
There is one place in lld that checks if the symbol is in a readonly segment (addCopyRelSymbol).
1 PT_LOAD(RWX) + 1 PT_GNU_RELRO works just fine. You don't even need PT_TLS - STT_TLS symbols can be placed in an arbitrary section. Linkers and various binutils don't validate PT_TLS.

Yes this is a stub reader/writer (also textual stubs which can't be linked against are a second format that has already been reviewed largely) but there are other concerns than just what works for lld, namely what works for llvm-readobj. I don't think its worth adding the PT_GNU_RELRO segment. What's the advantage? Also I'll get rid of the PT_TLS segment. The layout I have in the latest patch is

1 PT_LOAD (R)

.dynstr
.dynsym
1 PT_DYNAMIC (R)
  .dynamic 
.def
.tls

What the flags are don't really matter too much to me. I wasn't aware of the linker checking for a readonly segment but I think that favors just making everything read only. I'll think about removing the .tls section as well.

Using a TLS nobits saves space

The best layout I can think of is:
PT_LOAD(PT_GNU_RELRO(.data.rel.ro PT_TLS(.tdata | .tbss) .bss.rel.ro) | .data .bss)

What do you need the PT_GNU_RELRO for? What is .tdata or .tbss for? What is .data.rel.ro for?

Currently lld has (.tbss cannot really be SHT_NOBITS due to .data.rel.ro):
PT_LOAD(PT_GNU_RELRO(PT_TLS(.tdata .tbss) | .data.rel.ro .bss.rel.ro) | .data .bss)

I don't follow.

(note that .dynamic is sometimes read only and sometimes writeable)

Do you mean ld.lld -z rodynamic (D33251 http://lists.llvm.org/pipermail/llvm-dev/2017-May/113258.html)? (.dynamic is in the PT_GNU_RELRO region)

Yep that's what I mean. The .dynamic section is generally relro but can be read only if that flag is used. We shouldn't add a PT_GNU_RELRO just to be consistent with one particular way of doing things. .dynamic is relro because of DT_DEBUG which is not always used. If it isn't used it is possible to make it read only. What add an extra 50 or so bytes when we don't really need to?

Ok so I went digging on the PT_TLS and tls section issues. Seems like removing both the segment and the section is the right way to go.

llvm-readobj currently checks that SHT_TLS sections are in a PT_TLS so we should either have both or none. lld checks that TLS symbols are TLS consistently across the shared object and the output object so we have to use STT_TLS symbols for them (this is kind of obvious). I think lld (as you already stated) imposes no other requirements on this tool. Nothing seems to check that STT_TLS symbols are defined inside of of SHF_TLS sections so we should be good to go there.

@MuskRay Have you checked other linkers or tools?

  1. No more TLS stuff but STT_TLS symbol types are preserved which appears to be sufficient for all known use cases
  2. Fixed all the tests, accidental extra tests, minor issues, and made all tests pass.

I think this is ready for review but I'm not going to split it up until Monday.

The following is the first split. Subsequent splits will be much smaller (adding support for program headers, adding support for soname and needed, adding support for symbols, etc...): https://reviews.llvm.org/D61767#change-LCioCf1j7osU