Page MenuHomePhabricator

[elfabi] Add support for writing ELF header for binary stubs
Needs ReviewPublic

Authored by jakehehrlich on Dec 18 2018, 10:28 AM.



This change introduces the beginnings of ELF binary stub write support for elfabi. Specifying an output file path as well as --output-target=<target> will write a binary ELF stub to the specified output file path. For this patch, only the ELF file header is written to the file.

Diff Detail


Event Timeline

amontanez created this revision.Dec 18 2018, 10:28 AM

The structure looks most good but I've got a lot of little requests.


There's no need to have these extra symbols here.




Not sure I like this interface but if you do want todo something like this 1) use uint8_t instead of 'char' and 2) MutableArrayRef wraps the pointer and size for you so you don't have to carry them around while still allowing you to modify the contents as you need to.


Calling getBinarySize twice since the user already has to call it isn't ideal. Also if you want this check it seems like an assertion would be better. Also if you just return a Buffer from here rather than telling the user how big of a buffer to construct (though you force them to use a specific kind of buffer) then you can avoid the check all together.


I think using this technique is most justified by larger code (like what you'll have later) so I'm cool keeping these but at this size it feels like it could just all go in a header.


Seems like this can be inlined.


The whole 'Writer' thing is supposed to allow for a base class to exist so you don't have to know what Writer you're using, but nothing like that is happening here. If we're not doing anything like that do you think we could just make these all functions?


This can be private or if you turn these into functions, you can make them static in the defining object.


Seems like you might be able to simplify this to just return a buffer of some kind.


This can be private.


Use uint8_t instead of char for raw binary data. llvm doesn't even use full C++11 but in C++17 we'll use std::byte


I think it would be nicer if this took the output format as a parameter.


Use == and don't construct a StringRef.

ruiu added a subscriber: ruiu.Dec 18 2018, 5:01 PM
ruiu added inline comments.

0x00 and 0u are all just 0, so I'd just write "0". Sign-extending 0 yields just 0, so "u" is redundant.


I agree with Jake that because this can be done with functions it's probably better to do this using functions without a class. In addition to that, it looks like a "Impl" class that doesn't inherit any class a bit weird, because usually an "Impl" class implements an abstract interface of some other class (I'm not suggesting you define an abstract class and an implementation class, but just pointing out that the current name is perhaps not that good.)


It might be discussed before, but what is a value of returning an Error from a function and then print that error & exit from the main function? Propagating an error all the way to the main function makes function signatures more complex (any function that can fail or calls a function that can fail has to have a function signature of ErrorOr<T> instead of just T). If this is just a command, then maybe just printing out an error message and exit is better?

jakehehrlich added inline comments.Dec 19 2018, 2:56 PM

Since you have an error and are currently returning an error, that error should incorporate the specific information from this error and not make it more vauge.


Yeah there's some line where things will only ever go in this code and some line where things would go into an eventual library. It isn't 100% clear where that line is right now. I'm in favor of earing on the side of caution and propagating more than less for the time being.

amontanez marked 18 inline comments as done.
amontanez added a reviewer: ruiu.

This should address most of the comments. All functions except for the single writeBinaryStub() have become static functions, and the class has been removed. I'll be updating D55864 to match these changes as soon as possible.

jhenderson added inline comments.Jan 21 2019, 2:17 AM

Super nit: Space between # and CHECK. Same applies to other tests.


I believe this information is derived from the ElfHeader, so there's not much point in testing it, since you already test it below when testing the ElfHeader itself.

Same applies to other tests.


"calculates what the size" -> "calculates the size"


You should probably set the e_shentsize and e_phentsize fields too, since those are constant.


Same comment as above. It seems weird for Stub to be a non-const reference.


Add a blank line here.


You can do these two lines in one, and lose the braces too, i.e:

if (Error BinaryWriteError = writeELFBinaryToBuffer<ELFT>(Stub, BufRef))
  return BinaryWriteError;



"begins the process"? Sounds to me like this function should do the whole process, given its name.


It seems odd to me that Stub is a non-const reference. I wouldn't expect this function to modify it based on the current description.


It probably reads easier if there's a new line between each if block.

amontanez updated this revision to Diff 182938.Jan 22 2019, 9:40 AM
amontanez marked 11 inline comments as done.

Comments addressed.

One small comment from me, otherwise looks good from my point of view, assuming the other reviewers are happy.


Now that you are doing this, please make sure that it is tested too.

amontanez updated this revision to Diff 183121.Jan 23 2019, 9:47 AM
amontanez marked an inline comment as done.

Updated tests.

jakehehrlich added inline comments.Jan 23 2019, 12:46 PM

Calculating the size and writing can be a bit fragile. That said you can't write until you have enough space allocated and because we want to use a FileBuffer and avoid copying, we only want to write once. A trick I've seen used is to think about 'WriteCommands' write commands know the maximum index that they write to and can perform the write themselves. So rather than performing size calculation and writes separately, you construct a list of write commands. From this list you then traverse it to get the maximum index written to, and then traverse it once more to write into the now allocated buffer. This way you don't have parallel code but you avoid reallocating a mapped buffer.


Use ELFMAG* instead of each of the actual constants here.


It turns out that there are use cases where this can be other things. This is a good default but sometimes the user should have to specify this. @jhenderson has hit this issue in BSD land. I can't seem to find the code for that however. Hopefully James can weigh in.

Either way I don't think its something you need to worry about right now but it was possibly an oversight on our part.


You can append to an error I believe so that you don't have to consume it.

jhenderson added inline comments.Jan 31 2019, 5:46 AM

The issue in llvm-objcopy was that it was copying an existing file and discarding the EI_OSABI and EI_ABIVERSION. Certainly it might be useful in the future to be able to say what the value should be, but as this is not currently required, I think it can be delayed until there's a request for it.

On the other hand, converting a binary into a text file then back into a binary should probably support this at some point.


Nit: this should start with a capital letter and end with a full stop.


You might want to consider factoring these out into a single function that takes an Error.

jakehehrlich commandeered this revision.Apr 26 2019, 4:38 PM
jakehehrlich edited reviewers, added: amontanez; removed: jakehehrlich.

I'm picking this up. Looking for review on this again. I'm going to drive this to completion enough to use on Fuchsia and then maintain it. After it has the features we want for Fuchsia I'll work on adding features that other people want but at a slower pace (like I did after a while with llvm-objcopy) eventually I'll ramp down but honestly, unlike llvm-objcopy, I could see this tool stabilizing.


I went ahead and implemented my own advice here. Seems to solve the problem of separating calculation of size and writing of data. The alternative is to have a type that mirrors the format of the stub but contains layout information and then to perform layout and then writing individually like we do in llvm-objcopy.


Do you remember what you meant by this? I'm starting this back up again.

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2019, 4:38 PM
jhenderson added inline comments.Apr 30 2019, 2:20 AM

clang-format? (I think it should be const T &Value)


out -> Out


The WithColor::error() and exit(1) are repeated in a couple of places. It might be nice if they were a simple function that takes an llvm::Errror and does not return, e.g:

void reportError(Error Err) {
      WithColor::error() << Err << "\n";

although I feel like there might also be other functions available to do the same thing.

plotfi added a subscriber: plotfi.Apr 30 2019, 3:34 PM

I'm picking this up. Looking for review on this again. I'm going to drive this to completion enough to use on Fuchsia and then maintain it. After it has the features we want for Fuchsia I'll work on adding features that other people want but at a slower pace (like I did after a while with llvm-objcopy) eventually I'll ramp down but honestly, unlike llvm-objcopy, I could see this tool stabilizing.

@jakehehrlich Is this the diff for adding the binary stubs support we were talking about in D60974 ?