This is an archive of the discontinued LLVM Phabricator instance.

[elfabi] Introduce tool for ELF TextAPI
ClosedPublic

Authored by amontanez on Dec 5 2018, 8:32 PM.

Details

Summary

Follow up for D53051

This patch introduces the tool associated with the ELF implementation of TextAPI (previously llvm-tapi, renamed for better distinction). This tool will house a number of features related to analysis and manipulation of shared object's exposed interfaces. The first major feature for this tool is support for producing binary stubs that are useful for compile-time linking of shared objects. This patch introduces beginnings of support for reading binary ELF objects to work towards that goal.

Added:

  • elfabi tool.
  • Support for reading architecture from a binary ELF file into an ELFStub.
  • Support for writing .tbe files.

Diff Detail

Repository
rL LLVM

Event Timeline

amontanez created this revision.Dec 5 2018, 8:32 PM
jakehehrlich added inline comments.Dec 6 2018, 2:19 PM
llvm/tools/llvm-elfabi/ELFObjHandler.cpp
29 ↗(On Diff #176915)

If something should never be null, you generally should use a reference. If you can't use a reference it is often (but not always) the case that something is wrong with the design. I generally prefer references to pointers.

48 ↗(On Diff #176915)

Yeah I think this should just be a single function and the other classes shouldn't exist.

52 ↗(On Diff #176915)

Don't consume the error, return it. Also I do this awful thing throughout llvm-objcopy where I just hard fail the program and exit as soon as something bad happens...you haven't done that but I'd like restress that you shouldn't do that.

71 ↗(On Diff #176915)

technically it could just be any binary format you didn't expect. Also this should probably return an Expected unique pointer that is never null.

llvm/tools/llvm-elfabi/ELFObjHandler.h
34 ↗(On Diff #176915)

Can you just make this a function? I don't think it needs a whole class at the moment and I'm not sure it ever will.

43 ↗(On Diff #176915)

I don't think this interface makes sense. What does constructing an ELFStubBuilder without an ELF object file accomplish? Also if we 100% expect this to be empty, maybe the interface that is used should return a new ELFStub rather than requiring that we give it an empty one.

57 ↗(On Diff #176915)

It doesn't seem like this should be a class. It contains no state and has one function. Functions are your friend, unnecessary classes are the enemy.

llvm/tools/llvm-elfabi/llvm-elfabi.cpp
24–25 ↗(On Diff #176915)

Having this as global is bad. Globals (sans use cases like options) mean one of two things 1) you shouldn't have the class in the first place 2) you're maintaining global state which is almost always bad.

61 ↗(On Diff #176915)

As you might have heard me just speak with Roland the SoName should be optional and never synthetic.

68 ↗(On Diff #176915)

Return an error rather than terminating even here. This is an example of that terrible thing that I did. I also let that reality seep into all the nooks and crannies of my interfaces.

79 ↗(On Diff #176915)

writing should just wirte, not modify. If you update the ELFStub it should go 1) Modify 2) Write.

llvm/tools/llvm-elfabi/llvm-elfabi.h
22 ↗(On Diff #176915)

I realize I do this in llvm-objcopy but I do it so that I can exit with an error from wherever. It's really better to propagate errors all the way back to main or near main and handle them once there if possible. That might not be a perfect fit but if you find that you *can* plumb all the errors back to main, you should remove these functions and this file.

jakehehrlich added inline comments.Dec 6 2018, 2:19 PM
llvm/tools/llvm-elfabi/llvm-elfabi.cpp
96 ↗(On Diff #176915)

Wait to declare this until you construct it later.

100 ↗(On Diff #176915)

I think "unable to read file" is vague and likely covered by error already. Maybe "unable to read .tbe file '%s'" where "%s" should be the filename even if the filename might be mentioned in the error. That gives more information.

You should have a test to confirm that this error happens and is handled correctly.

Also do you want to take the error here?

106 ↗(On Diff #176915)

As discussed offline, we should use Expected, check for an error (and try the next if there is one) and then create a new Error that wraps both and then chooses which one to report based on some heuristic (basically move the heuristic to the error reporting rather than the file parsing stage so that no heuristic is ever used for parsing). I know I said checking the extension would be easy but checking for ELF magic is also probably easy and more reliable so I guess now I agree with you (I was just thinking it would be hard to check the magic here). It will probably make sense to also include the file path in the error that you create so that it can report which file the error was an issue for. A really cool heuristic would be

  1. If the first 4 bytes of the memory are ELF magic, report the ELF error (but state assumption)
  2. If the extension is .tbe report the the TBE error (but state the assumption). Also you can search the file for
  3. If the neither of those things are true, tell the user you don't know what's going on and report both. It's also acceptable to search the whole file looking for a sign that

If you'd want I'd also still accept just looking at the extension.

121 ↗(On Diff #176915)

Don't declare until you construct later.

125 ↗(On Diff #176915)

That should probably be a part of the error already if possible.

amontanez marked 15 inline comments as done.

Changed:

  • Classes removed, functions are all standalone with no state.
  • Now using Expected<std::unique_ptr<ELFStub>> for everything that would previously return a std::unique_ptr<ELFStub>.
  • Added tests to check for proper failure output.

Removed:

  • Automatic DT_SONAME generation.
amontanez marked 2 inline comments as done.Dec 10 2018, 1:03 PM
amontanez added inline comments.
llvm/tools/llvm-elfabi/llvm-elfabi.cpp
61 ↗(On Diff #176915)

I've noted this and will make SoName optional in a separate patch. For now I have removed the code that would generate a SoName from the file name if the SoName was empty.

106 ↗(On Diff #176915)

If the YAML reader fails, it will always output to stderr. Having the YAML reader second causes this to only occur if both fail. I can later augment llvm::yaml::Input so we can have more control over where YAML outputs the error to, but for now I've defaulted to outputting both errors in the situation that an unreadable file is encountered. This makes slightly more sense given that right now I can't completely filter out the YAML error if the heuristics determine the binary error is of more interest.

This function's implementation isn't scalable, but it also doesn't impact design decisions of other code in this tool. I have marked a todo so I can revisit this separately. Let me know if you feel I should address this immediately.

jakehehrlich added inline comments.Dec 10 2018, 4:19 PM
llvm/tools/llvm-elfabi/ELFObjHandler.cpp
29 ↗(On Diff #177586)

nit: You can just put all these inside of llvm::elfabi namespace rather than repeatedly declaring them as such.

42 ↗(On Diff #177586)

nit: Can you add some rough TODOs here on what else you need to add to get this fully featured?

61 ↗(On Diff #177586)

nit: The type should be automatically inferred so you don't need the explicit template parameter.

llvm/tools/llvm-elfabi/llvm-elfabi.cpp
36–44 ↗(On Diff #177586)

nit: Can you just inline these and return from main instead of calling these? I think we don't expect any other exit points. Having these functions around is just kind of asking for them to be used and I'm skeptical that's a good idea.

120–131 ↗(On Diff #177586)

I think this should be implemented as a new aggregate error that you return. You can move the errors into the aggregate error and then the main function can report the aggregate error.

amontanez updated this revision to Diff 177794.Dec 11 2018, 3:36 PM
amontanez marked 5 inline comments as done.

Added:

  • ErrorCollector for collectively handling multiple related errors.

Fixed other nits.

amontanez marked 2 inline comments as done.Dec 11 2018, 3:39 PM
jakehehrlich added inline comments.Dec 11 2018, 5:56 PM
llvm/tools/llvm-elfabi/ErrorCollector.cpp
36 ↗(On Diff #177794)

Rather than this, when you make the error you should be left with an empty vector of errors. Long term I'd like to see this be a move of the vector into an aggregate error but for now I'm fine just consuming and clearing.

63 ↗(On Diff #177794)

You don't need to clear since the destructor will be called after this anyway.

llvm/tools/llvm-elfabi/ErrorCollector.h
44 ↗(On Diff #177794)

I think I'd prefer this be marked private.

61 ↗(On Diff #177794)

I think id prefer this marked private for now.

amontanez updated this revision to Diff 177811.Dec 11 2018, 6:37 PM
amontanez marked 3 inline comments as done.

Changed:

  • ErrorCollector now clears errors when makeError() is called.

Fixed a few other nits.

amontanez marked 2 inline comments as done.Dec 11 2018, 6:43 PM
amontanez added inline comments.
llvm/tools/llvm-elfabi/ErrorCollector.cpp
36 ↗(On Diff #177794)

Agreed. I marked TODO at the top of this function since that's something that can be done later when it's slightly higher priority.

jakehehrlich accepted this revision.Dec 11 2018, 6:45 PM
This revision is now accepted and ready to land.Dec 11 2018, 6:45 PM

I'd like someone outside of our team like @jhenderson to review this. Landing a new tool without some outside eyes looking at it isn't very community friendly IMO.

I'd like someone outside of our team like @jhenderson to review this. Landing a new tool without some outside eyes looking at it isn't very community friendly IMO.

I'll try to find some time in the next 2-3 days to look over this.

amontanez updated this revision to Diff 177966.Dec 12 2018, 4:00 PM
amontanez marked an inline comment as done.

Small change to binary-read-arch.test to make it easier to debug test failure.

jhenderson added inline comments.Dec 13 2018, 3:02 AM
llvm/test/tools/llvm-elfabi/binary-read-arch.test
12 ↗(On Diff #177811)

Nit here and elsewhere - remove all the extra spaces between the CHECK/CHECK-NEXT and the actual text being checked. (It's okay to have the spaces after CHECK: to make the patterns match with CHECK-NEXT, but this test has extra on top of that).

3 ↗(On Diff #177966)

Why the cat command? You can have FileCheck read a file via --input-file (also applies to other tests).

llvm/test/tools/llvm-elfabi/fail-file-open.test
5 ↗(On Diff #177811)

Perhaps the error check here could include "NotAFileInTestingDir" at the end?

llvm/test/tools/llvm-elfabi/replace-soname-tbe.test
2 ↗(On Diff #177966)

Nit: inconsistent use of '--' and '-' for switch prefixes.

llvm/test/tools/llvm-elfabi/tbe-read-basic.test
17 ↗(On Diff #177966)

Don't know if it belongs in this particular test, but I feel like we should have at least one test that shows the current version is correct.

llvm/tools/LLVMBuild.txt
35 ↗(On Diff #177811)

I'm not entirely convinced by the "elfabi" name since it isn't really an ABI, but I don't have a clearly better name.

llvm/tools/llvm-elfabi/ELFObjHandler.cpp
32 ↗(On Diff #177966)

Nit: you should be able to remove the object::, since you're using the namespace object.

39 ↗(On Diff #177966)

Will the ElfHeader likely be used for anything else? If not, you should fetch it inline, I think.

41 ↗(On Diff #177966)

No need for this comment. It's obvious from the function name.

54 ↗(On Diff #177966)

Are you anticipating this function becoming more complicated? At the moment, there's no point in it returning an error, or even really existing, since you could just set the TargetStub.Arch member at the call site instead.

llvm/tools/llvm-elfabi/ELFObjHandler.h
28 ↗(On Diff #177966)

Pointer -> Reference.

The sentence as a whole sounds a bit clunky. Maybe "Source ELFObjectFile". I don't think you need to explicitly state that it's a pointer or reference (it's in the type signature).

33 ↗(On Diff #177966)

This comment smells a bit like this should be a member function of ELFStub. Thoughts on that?

34–35 ↗(On Diff #177966)

who's -> whose

As above, I don't think you need to explicitly say that TargetStub and Header are references.

What is the ELF file header for however? That needs explaining more here.

llvm/tools/llvm-elfabi/ErrorCollector.h
1 ↗(On Diff #177966)

My understanding of Error is that you don't need this class. Error can actually represent multiple errors. You can use joinErrors() to combine two Error instances into a single one. See the docs here.

llvm/tools/llvm-elfabi/llvm-elfabi.cpp
31 ↗(On Diff #177966)

I think the convention tends to be for cl::opt variable names to essentially match the switch name, so this should just be "Soname" (or "SoName", or even "SOName").

32 ↗(On Diff #177966)

Perhaps this should be --soname to match the equivalent linker flag.

39 ↗(On Diff #177966)

Similar to populateArch, why is this a separate function and not just done inline (or possibly a member function of ELFStub)?

43 ↗(On Diff #177966)

Again, I'm not sure there's any value in this function being out-of-line.

70 ↗(On Diff #177966)

Move this down to first use.

85 ↗(On Diff #177966)

I think *StubFromELF rather than StubFromELF.get() is more common usage in this case.

This could probably be written a bit simpler as follows:

Expected<std::unique_ptr<ELFStub>> StubFromELF =
      readELFFile(FileReadBuffer->getMemBufferRef());
if (StubFromELF) {
  return std::move(*StubFromELF);
}
EC.addError(StubFromELF.takeError(), "BinaryRead");
111 ↗(On Diff #177966)

This message sounds a bit clunky. Perhaps "No file readers were able to read..."?

128 ↗(On Diff #177966)

Nit: No need for explicit return 0 in a main in C++ these days.

68 ↗(On Diff #176915)

FWIW, in functions in the tool itself, I have no issue with errors raised within functions. I don't think it's always necessary to propagate back up to the top-level. The issue is when they are in library functions.

amontanez updated this revision to Diff 178122.Dec 13 2018, 1:14 PM
amontanez marked 16 inline comments as done.

Changed:

  • Addressed several areas of feedback.
amontanez updated this revision to Diff 178124.Dec 13 2018, 1:19 PM

Forgot to include a few changes for a couple files.

amontanez marked 7 inline comments as done.Dec 13 2018, 3:10 PM

Hopefully this clarifies the direction on some things brought up. I didn't explicitly mention that everything outside llvm-elfabi.cpp is designed as if it were part of a library. Let me know if this resolves some concerns you had or brings up new ones. I really appreciate your insight, push back again on anything you still feels needs to be changed and I'll look into it again.

llvm/tools/LLVMBuild.txt
35 ↗(On Diff #177811)

That's very understandable. We didn't have any better ideas for names, so we went with elfabi. I'm totally open to suggestions since I understand changing this later would be a huge pain.

This tool views ELF shared objects roughly from a compile-time linker's perspective, so maybe there's a better name to be had that stems from that.

llvm/tools/llvm-elfabi/ELFObjHandler.cpp
39 ↗(On Diff #177966)

Done. I don't think we'll need to explicitly use it again.

54 ↗(On Diff #177966)

I've changed it to void as I doubt this will become more complicated over time.

Even though it is very simple, I would like to keep it as a function. The plan is to have one function for each ELFStub member. buildStub() streamlines the process of fetching the arguments needed for each function from an ELFFile<ELFT>, but I don't want the implementation of buildStub() to be the sole point of documentation on how to populate ELFStub.Arch. I'd like it to be very clear what information is required to populate each member. While this is simple for Arch, it becomes a little more involved for some of the other members. having populateArch() helps with consistency in this situation. D55629 helps to better illustrate the direction I'd like to take.

This is also a result of trying to design a consistent, unit-testable interface. If you depend on using complete ELFObjectFiles, things become significantly harder to unit test. I understand I can't unit test a tool, but that was how I originally designed it, and would be useful if this does get moved to a library.

llvm/tools/llvm-elfabi/ELFObjHandler.h
33 ↗(On Diff #177966)

While that is definitely an appealing approach, we've committed to not having the TextAPI library depend on libObject (Juergen would like libObject to depend on some parts of the MachO TextAPI implementation). The result of that discussion was that the ELF implementation for TextAPI will mostly just contain the YAML reader/writer and the ELFStub internal representation for now.

Overall, everything in this tool outside of llvm-elfabi.cpp is designed as if it were part of a library. There's two main reasons for this:

  1. It's not completely clear where the ELF binary readers/writers belong since they won't be in TextAPI. An argument could be made that they should be in a library, but a counter argument could be that binary readers/writers are only really used for the tool. I'd rather put it in the tool for now to avoid bikeshedding another library. That allows me to start adding stubbing functionality asap.
  1. Designing everything as if it were part of a library prevents this tool from becoming a monolith that can't later be turned into a library.
llvm/tools/llvm-elfabi/ErrorCollector.h
1 ↗(On Diff #177966)

I looked into that initially, and I don't feel it solves our problem. While joinErrors() solves the problem of collecting multiple errors into one, ErrorCollector solves our issue of determining whether or not to consume errors.

In llvm-elfabi.cpp, each file reader might fail and produce an error. If *any* reader succeeds, though, the other errors should be consumed as they become irrelevant. If none succeed, they should all be reported (in the future Jake and I would like to have more fine grained control over which are reported, but that's not in the scope of this patch). This was the cleanest way I felt we could resolve the issue without having very messy use of consumeError() throughout readInputFile().

llvm/tools/llvm-elfabi/llvm-elfabi.cpp
39 ↗(On Diff #177966)

You're right, there's not much need for updateTBEVersion(). I could see there being more logic in the future if there's a desire to write a TBE using an older writer, but it probably doesn't warrant a function right now. I've inlined it in main().

I'm happy with everything I've reviewed, including the related changes, but time means that I haven't had a chance to review the ErrorCollector class in more detail, and I'm not sure I'll get a chance this side of the New Year. If others are happy for this to go in, feel free to commit. I may come back and review the last bit at a later point.

I got a few more minutes this morning, but that's it from me until the New Year. I don't mind this going in before then, once my comments have been addressed and somebody else reviews it.

llvm/tools/llvm-elfabi/ErrorCollector.cpp
23 ↗(On Diff #178124)

Should Err here be an r-value reference? I'm not sure copying it strictly makes sense. Tag be a StringRef (copy it at the point where necessary i.e. at the Tags.push_back() line).

40 ↗(On Diff #178124)

errc::interrupted doesn't feel like the right answer here (what was interrupted?). This should probably be something else. I guess it may be worth checking the errc of of all errors and using that if it's the same, or picking one if not?

46 ↗(On Diff #178124)

Unless the indexes are going to be useful for something else, I don't think they give us anything. I also am not sure there's a need to print the "Encountered multiple errors bit and then print them all as a single error".

Instead, I'd print each of the errors separately (i.e. with the WithColor::error() method or equivalent), since each is a distinct message. One strong reason for doing this is that each is indeed a distinct error, so should be prefixed with "error:". This helps IDEs and the like display the different errors in a meaningful way (if you plugged this into Visual Studio for example, the Error List view would simply say "Encountered multiple errors" and you'd have to go to the build output to actually view them, whereas reporting them separately would result in separate errors being listed). Additionally, printing separate errors is easier to read when viewing the output in a terminal (because each is individually highlighted).

53–56 ↗(On Diff #178124)

return Errors.empty();

70 ↗(On Diff #178124)

Does logging mark Error as checked? If not, we don't need the abort() (and can probably also get rid of the "Aborted due to...." message too) by simply skipping the consumeError() loop in the destructor, and letting the unchecked Error assertion fire as normal.

llvm/tools/llvm-elfabi/ErrorCollector.h
62 ↗(On Diff #178124)

Could we avoid the abbreviation here? Everywhere else it's Errors, not Errs (so this would be allErrorsHandled).

amontanez updated this revision to Diff 178965.Dec 19 2018, 2:20 PM
amontanez marked 2 inline comments as done.

Changed:

  • Populating ELFStub.Arch is now inline.
  • ErrorCollector.log() now prefixes each error with "error: " in red.
  • ErrorCollector.makeError() now uses joinErrors() in lieu of a specialized Error type (marked as TODO) that gives more control over how to handle each error.
  • addError() now takes a r-value reference Error.
amontanez updated this revision to Diff 178968.Dec 19 2018, 2:33 PM
amontanez marked 4 inline comments as done.

Quick bugfix in ErrorCollector::makeError() and change to make the function more concise.

amontanez marked 3 inline comments as done.Dec 19 2018, 2:50 PM
amontanez added inline comments.
llvm/tools/llvm-elfabi/ErrorCollector.cpp
46 ↗(On Diff #178124)

I've fixed this for ErrorCollector::log(), but it doesn't work as cleanly for the previous ErrorCollector::makeError() since color information would get stripped out. I've switched ErrorCollector::makeError() to use joinErrors() for now.

70 ↗(On Diff #178124)

I tried omitting this and only one error is printed since the program terminates early when the first unhanded error is encountered.

amontanez updated this revision to Diff 178976.Dec 19 2018, 3:20 PM
amontanez marked an inline comment as done.

Changed:

  • Fixed test formatting.
This revision was automatically updated to reflect the committed changes.