Page MenuHomePhabricator

[llvm-tapi] initial commit, supports ELF text stubs

Authored by amontanez on Oct 9 2018, 3:52 PM.



TextAPI is a library and accompanying tool that allows conversion between binary shared object stubs and textual counterparts. The motivations and uses cases for this are explained thoroughly in the llvm-dev proposal [1]. This initial commit proposes a potential structure for the TAPI library, also including support for reading/writing text-based ELF stubs (.tbe) in addition to preliminary support for reading binary ELF files. The goal for this patch is to ensure the project architecture appropriately welcomes integration of Mach-O stubbing from Apple's TAPI [2].


  • TextAPI library
  • .tbe read support
  • .tbe write (to raw_ostream) support


Diff Detail


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Nov 5 2018, 3:26 AM
1 ↗(On Diff #172450)

Not looked at common practice, but I noticed that according to the docs "*- C++ -*" is unnecessary in .cpp files.

25 ↗(On Diff #172450)

If you use llvm::errc, you don't need the std::make_error_code here.

74 ↗(On Diff #172450)

(uint64_t)0u looks weird to me. Do you need to be explicit about the type here?

115 ↗(On Diff #172450)

I'm guessing this is some magic in the YAML code? I don't really understand it, if I'm honest.

42–44 ↗(On Diff #172450)

Maybe to make this look slightly less hideous this should be compressed into one line:
EXPECT_STREQ(Line1.str().data(), Line2.str().data())

118 ↗(On Diff #172450)

Move this inline.

37–40 ↗(On Diff #171991)

Okay, makes sense.

Aside: I reckon StringRef should be null terminated - it's a reference to a string, either a C-style literal one (complete with '\0') or a std::string (which is null-terminated in modern C++).

amontanez updated this revision to Diff 172623.Nov 5 2018, 11:39 AM
amontanez marked 10 inline comments as done.

Thanks for being patient with me, I'm really appreciating your feedback!

74 ↗(On Diff #172450)

Yes, otherwise it's ambiguous. I believe the the u can be safely dropped.

115 ↗(On Diff #172450)

Yes. YAML can't tell the difference between ELFArch and uint16_t since ELFArch is just a typedef based on uint16_t. uint16_t has a default YAML implementation, so there must be a LLVM_YAML_STRONG_TYPEDEF to define a distinct type with overridden implementation. I avoided using the strong YAML typedef in ELFStub because things get rather messy if you try to use it for things other than YAML. Hence, the need for the reference cast here. It forces our standard type to be treated as a distinct YAML type.

118 ↗(On Diff #172450)

Sorry, that slipped through.

37–40 ↗(On Diff #171991)

Normally it would be, but as I understand it's not guaranteed to be. Since it's just a reference to a character array, it's not null terminated if the reference is to a substring that isn't null terminated. Since I'm splitting on '\n', the last character of the substring isn't a null terminator. I updated the code comment to correct my wording.

amontanez marked 4 inline comments as done.Nov 5 2018, 11:42 AM

I've got a few more nits, but otherwise I'm happy with this. Somebody else should definitely take a look too though.

59 ↗(On Diff #172623)
22 ↗(On Diff #172623)

Don't use else after an if that always returns.

23–26 ↗(On Diff #172623)

createStringError is also overloaded to work with printf-style formatting, so you could do all of this simply in one line, with something like the following:

return createStringError(errc::invalid_argument,
        "ELF incompatible with file type `%s`\n",
58 ↗(On Diff #172623)

Don't use else after return.

111 ↗(On Diff #172623)

No need for the braces here.

170 ↗(On Diff #172623)

Trailing full-stop needed on this comment.

amontanez updated this revision to Diff 172789.Nov 6 2018, 10:13 AM
amontanez marked 6 inline comments as done.
jhenderson accepted this revision.Nov 7 2018, 2:31 AM

Great, LGTM, but please get somebody else to sign off too.

This revision is now accepted and ready to land.Nov 7 2018, 2:31 AM

Some drive by comments again. Be good to get @ributzka to take a final look.

1 ↗(On Diff #172789)

This file could use some more comments in general.

98 ↗(On Diff #172789)

Elaborate on what needs to be done to fix the TODO please.

111 ↗(On Diff #172789)

I'd like to try to avoid "v1" style naming for things. Any hope of an explicit version field instead?

jakehehrlich added inline comments.Nov 9 2018, 12:00 PM
43 ↗(On Diff #172789)

Can you give a clear outline as to what the ELFInterfaceFile class does that ELFStub does not? At the moment it seems to merely be an extremely verbose way of keeping track of FileType manually. In llvm-objcopy we have to keep track of this because there is an expectation that the output type match the input type but I don't think you have that constraint here. One output type is assumed if none is given, .tbe.

I think a better way to handle the different formats is to have a ReaderWriter class for each and methods format. That can then provide the description, and those can be registered in the command line tool in an appropriate fashion. The user should be able to know which type of object they read in by checking which reader/writer they used successfully.

45 ↗(On Diff #172789)

What is the intended semantics of a interface without a stub?

26 ↗(On Diff #172789)

Can this be an enum class or "scoped enum"? Same for all enums.

40 ↗(On Diff #172789)

We decided offline that we needed to keep track of weakness correct?

46 ↗(On Diff #172789)

offline we decided we needed to keep track of dt_needed as well, right? I'll accept a TODO here for that.

55 ↗(On Diff #172789)

Where ever you use this, you can most likely replace it with a lambda or a direct comparison.

37 ↗(On Diff #172789)

I don't think using a raw_ostream is what we want here. It's probably fine but it would be better to use some kind of abstraction over Buffers that allowed both a raw_ostream to be used under the hood *and* a MemoryBuffer or FileBuffer. Importantly you avoid a copy of the file by using a FileBuffer on most systems.

18 ↗(On Diff #172789)

The reality is that you can't set to any non-single value.

31 ↗(On Diff #172789)

An Invalid FileType doesn't make sense. Why do you need this?

43 ↗(On Diff #172789)

All should probably not be an enum. The FileType enum is currently mixing up the notion of a value with a notion of a set IMO.

15 ↗(On Diff #172789)

Define move constructor as well.

19 ↗(On Diff #172789)

Use '=' instead of loop and push_back.

amontanez updated this revision to Diff 173785.Nov 12 2018, 4:49 PM
amontanez marked 10 inline comments as done.


  • Field for weak symbols.
  • Field for listing libraries needed by a shared object.
  • Dedicated TBE version field.


  • ELFInterfaceFile


  • All ELF specifics are now in an ELF namespace and in ELF subdirectories to minimize conflict with MachO version.
amontanez added inline comments.Nov 12 2018, 4:51 PM
43 ↗(On Diff #172789)

Discussed offline, removed this since it provides abstraction that isn't clear will be necessary.

37 ↗(On Diff #172789)

Noted. It's unclear what that will look like right now, so I'll keep this as-is until we reach the point that this becomes a concern.

amontanez updated this revision to Diff 173875.EditedNov 13 2018, 10:08 AM


  • Renamed a few directories to align better with D53945
orivej added a subscriber: orivej.Nov 13 2018, 7:42 PM
ributzka accepted this revision.Nov 14 2018, 9:20 AM
ributzka added inline comments.
23 ↗(On Diff #173875)

I don't think the tapi namespace is required in general for this library, but that is just a personal preference.

26 ↗(On Diff #173875)

Thanks for changing the name. This will make it easier to rebase my patch.

44 ↗(On Diff #173875)

Why do you use a DenseMap here? Wouldn't be a switch enough?

57 ↗(On Diff #173875)

StringSwitch from ADT would be an alternative here too.

ributzka added inline comments.Nov 14 2018, 2:48 PM
22 ↗(On Diff #173875)

Do you really depend on Object? This will cause a circular dependency when libObject starts using TextAPI to read TBD/TBE files.

ributzka added inline comments.Nov 14 2018, 2:50 PM
1 ↗(On Diff #173875)

This is not Support anymore ;-)

amontanez updated this revision to Diff 174122.Nov 14 2018, 4:35 PM
amontanez marked 3 inline comments as done.


  • YAML bug fixed in D48144, workaround removed.
  • Using switches instead of maps for architecture field.

LGTM with nits. Please make sure Jurgen and you agree on the namespace and folder scheme first but other than than this LGTM.

49 ↗(On Diff #174122)

Please add a TODO for symbol versioning.

Oh and you'll need to remove the dependency on libObject as discussed. I pointed out that only changes I think you'll need to make but you might find a few more.

18 ↗(On Diff #174122)

Use BinaryFormat

22 ↗(On Diff #174122)

Make sure to remove this as well.

11 ↗(On Diff #174122)

And this.

amontanez marked 5 inline comments as done.
amontanez edited the summary of this revision. (Show Details)


  • TextAPI no longer depends on libObject.
ruiu added inline comments.Nov 28 2018, 11:17 AM
63 ↗(On Diff #175727)

Instead of providing this function, you could make Symbolss type to std::set which is a sorted set.

34 ↗(On Diff #175727)
const VersionTuple TBEVersionCurrent(1,0);

is more conventional.

13 ↗(On Diff #175727)

Let's use fully qualified name so that it doesn't look like we have ::tapi namespace.

using namespace llvm::tapi::elf;
32 ↗(On Diff #175727)

nit: I believe you can omit -> bool.

24 ↗(On Diff #175727)
StringRef Buf(Data);
amontanez added inline comments.Nov 28 2018, 5:42 PM
63 ↗(On Diff #175727)

Unfortunately std::set doesn't allow in-place modification of elements, so it won't work with YAML. I'll look into alternative options.

ruiu added inline comments.Nov 28 2018, 6:27 PM
63 ↗(On Diff #175727)

If a set is sorted by Name, and if you don't mutate that string, you are fine, no? You can safely do in-place modification to other members.

amontanez added inline comments.Nov 28 2018, 7:48 PM
63 ↗(On Diff #175727)

Elements of a std::set are const by default. This requires either using const_cast or declaring some members as mutable. I feel there's a cleaner way to do this.

amontanez marked 5 inline comments as done.Nov 30 2018, 2:57 PM
amontanez added inline comments.
63 ↗(On Diff #175727)

I realized that since I'm using CustomMappingTraits I can populate an ELFSymbol before inserting it into the set.

amontanez updated this revision to Diff 176211.Nov 30 2018, 3:03 PM
amontanez marked 5 inline comments as done.


  • Symbols are now stored in a std::set.
  • Fixed a few nits.
  • Namespace is now elfabi to avoid overlapping with elf namespace while also removing tapi namespace.
ruiu accepted this revision.Nov 30 2018, 3:24 PM

I do not have any further comments. LGTM

This revision was automatically updated to reflect the committed changes.
sbc100 added a subscriber: sbc100.Dec 6 2018, 11:18 AM

We are seeing a failure of YAMLWritesNoTBESyms on the WebAssembly waterfall due to this change:

FAIL: LLVM-Unit :: TextAPI/./TapiTests/ElfYamlTextAPI.YAMLWritesNoTBESyms (3237 of 44396)
******************** TEST 'LLVM-Unit :: TextAPI/./TapiTests/ElfYamlTextAPI.YAMLWritesNoTBESyms' FAILED ********************
Note: Google Test filter = ElfYamlTextAPI.YAMLWritesNoTBESyms
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ElfYamlTextAPI
[ RUN      ] ElfYamlTextAPI.YAMLWritesNoTBESyms
/b/build/slave/linux/build/src/src/work/llvm/unittests/TextAPI/ELFYAMLTest.cpp:42: Failure
      Expected: Line1.str().data()
      Which is: "NeededLibs:      "
To be equal to: Line2.str().data()
      Which is: "NeededLibs:      [,, ]"
/b/build/slave/linux/build/src/src/work/llvm/unittests/TextAPI/ELFYAMLTest.cpp:42: Failure
      Expected: Line1.str().data()
      Which is: "  -"
To be equal to: Line2.str().data()
      Which is: "Symbols:         {}"
/b/build/slave/linux/build/src/src/work/llvm/unittests/TextAPI/ELFYAMLTest.cpp:42: Failure
      Expected: Line1.str().data()
      Which is: "  -"
To be equal to: Line2.str().data()
      Which is: "..."

We suspect that its due to fact that LLVM_BUILD_LLVM_DYLIB and LLVM_LINK_LLVM_DYLIB are set on this builder.

Its seems that with the dylib use case the SequenceElementTraits from include/llvm/Support/YAMLTraits.h are in effect and we get:


Where as most builds get (and the test expects):

NeededLibs: [,, ]