This is an archive of the discontinued LLVM Phabricator instance.

[TextAPI] TBD Reader/Writer
ClosedPublic

Authored by ributzka on Oct 31 2018, 11:40 AM.

Details

Summary

Add basic infrastructure for reading and writting TBD files (version 1 - 3).

The TextAPI library is not used by anything yet (besides the unit tests). Tool
support will be added in a separate commit.

The TBD format is currently documented in the implementation file (TextStub.cpp).

Diff Detail

Event Timeline

ributzka created this revision.Oct 31 2018, 11:40 AM
mgrang added inline comments.Oct 31 2018, 11:59 AM
llvm/lib/TextAPI/TextStub.cpp
336 ↗(On Diff #171977)

Please use range based llvm::sort(Section.Symbols), etc instead of std::sort.

See https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

dexonsmith added inline comments.Oct 31 2018, 12:28 PM
llvm/lib/TextAPI/TextStub.cpp
337–338 ↗(On Diff #171977)

Related to @mgrang's comment about using range-based llvm::sort: it looks like there's a bug here where nothing gets sorted, since begin is passed twice.

compnerd added inline comments.Oct 31 2018, 12:40 PM
llvm/include/llvm/TextAPI/Architecture.def
1 ↗(On Diff #171977)

Missing copyright header.

llvm/include/llvm/TextAPI/ArchitectureSet.h
172 ↗(On Diff #171977)

I wonder if we can get away with llvm::BitVector here.

ributzka updated this revision to Diff 171995.Oct 31 2018, 12:42 PM

Replace std::sort with llvm::sort and fix copy-paste-o bugs.

ributzka updated this revision to Diff 172200.Nov 1 2018, 12:28 PM

Add missing license header.

ributzka added inline comments.Nov 1 2018, 4:01 PM
llvm/include/llvm/TextAPI/ArchitectureSet.h
172 ↗(On Diff #171977)

A llvm::BitVector would require twice the storage size and dynamic memory allocation/deallocation. I tried to avoid that, since every symbol has an ArchitectureSet associated with it.

I haven't looked at all of this yet, by a long way, but here are a few comments on the first few files.

llvm/include/llvm/TextAPI/ArchitectureSet.h
1 ↗(On Diff #172200)

There's a lot of code in this header that feels like it belongs in a separate .cpp. Doing so would make the class API easier to read, and might reduce the include dependencies.

22 ↗(On Diff #172200)

I'd be surprised if you need to explicitly include this header, but if you do, it should be cstddef probably.

40 ↗(On Diff #172200)

This should probably use an ArrayRef.

Do you need to explicit call the ArchitectureSet default constructor in these two?

General requests

  1. Can we namespace MachO specific stuff? I'm fine airing on the side of putting things in the MachO namespace and dedupping later if possible.
  2. This is pretty big. Can we break this up at all?
  3. I've thus far ignored the parts that are MachO specific and assumed someone else more qualified would handle that. Before giving an LGTM I'd like to see review of that code. Breaking this up would help me to do that if a person more expierenced with MachO isn't present to do that.
llvm/include/llvm/TextAPI/Registry.h
37 ↗(On Diff #172200)

Where is file_magic defined? I'm going to make a bunch of, very possibly wrong, assumptions here.

Presumably it's computed from MemoryBufferRef as well so requiring the user to pre-compute it is a bit odd. Also not all conceivable formats have magic.

48 ↗(On Diff #172200)

What would be a case where this would return false?

53 ↗(On Diff #172200)

Can you explain why this is needed? It seems like this could be split up into a Reader and a Writer.

llvm/lib/TextAPI/TextStub.cpp
591 ↗(On Diff #172200)

An issue Armando hit was that this doesn't allow comments at the start of TAPI files. It also just seems like a generally error prone way to check for the validity of such files IMO. I think a better way would be to just attempt to parse and handle the failure by attempting to parse the next format. It's unlikely that the parser would actually get very far before failing.

<bikeshed>
I noticed the library is named TextAPI, is there a specific reason for this? My patch names the library TAPI, but if there's a reason not to I can change that.
</bikeshed>

I'd like to start a conversation on where we'd like the ELF and MachO code paths to diverge. It looks like a thin abstraction of InterfaceFile.h that forks into MachO and ELF specific implementations would be reasonable. TextAPIReaderWriter seems generic enough that it should support an ELF implementation without any modification. In the context of this patch, those places look like reasonable points where implementation specifics might diverge. However, depending on what else would be upstreamed following this patch, I'm not sure if that's the best approach.

An alternative option might be to make this patch provide a single .tbd implementation as independently as possible, and then in separate patches move in the auxiliary (Architecture, Version, Registry, etc.) components later. That would make it significantly easier to provide specific feedback and discussion on whether code is MachO-specific or if some of it could be shared with ELF to reduce code duplication.

The original TAPI is rather large when compared against other llvm libraries, so I'd like to be careful when adding the ELF backend such that it doesn't inflate needlessly. I could adapt the ELF portions to take advantage of existing interfaces after they land, but without more granular review there might be some parts that are specialized and integrated so much that it wouldn't be realistic to do so without completely refactoring existing code.

<bikeshed>
I noticed the library is named TextAPI, is there a specific reason for this? My patch names the library TAPI, but if there's a reason not to I can change that.
</bikeshed>

I'd like to start a conversation on where we'd like the ELF and MachO code paths to diverge. It looks like a thin abstraction of InterfaceFile.h that forks into MachO and ELF specific implementations would be reasonable. TextAPIReaderWriter seems generic enough that it should support an ELF implementation without any modification. In the context of this patch, those places look like reasonable points where implementation specifics might diverge. However, depending on what else would be upstreamed following this patch, I'm not sure if that's the best approach.

An alternative option might be to make this patch provide a single .tbd implementation as independently as possible, and then in separate patches move in the auxiliary (Architecture, Version, Registry, etc.) components later. That would make it significantly easier to provide specific feedback and discussion on whether code is MachO-specific or if some of it could be shared with ELF to reduce code duplication.

The original TAPI is rather large when compared against other llvm libraries, so I'd like to be careful when adding the ELF backend such that it doesn't inflate needlessly. I could adapt the ELF portions to take advantage of existing interfaces after they land, but without more granular review there might be some parts that are specialized and integrated so much that it wouldn't be realistic to do so without completely refactoring existing code.

The interface file has *a lot* of MachO specific things in it. It also isn't clear that even Architecture may not even be easily dedupable IMO. The registry interface seems more on the level of split I was expecting. It demands a raw_ostream for the output which isn't ideal IMO because that's not really how you'd ever want to output an ELF file. I focused on that in my review because of that.

The interface file has *a lot* of MachO specific things in it. It also isn't clear that even Architecture may not even be easily dedupable IMO. The registry interface seems more on the level of split I was expecting. It demands a raw_ostream for the output which isn't ideal IMO because that's not really how you'd ever want to output an ELF file. I focused on that in my review because of that.

Are you suggesting the Registry has a notion of two (or more) different file definitions (i.e. MachOInterfaceFile and ELFInterfaceFile), or am I misinterpreting?

To clarify, I don't think the API of InterfaceFile could be shared, I was intending to suggest something similar to llvm::object::ObjectFile that would be used in the contexts that InterfaceFile is currently used in. When more specifics are needed (for example, in TextStub.cpp), if the FileType is appropriate then you can downcast to the class with the additional functionality. Does keeping a general interface in that way reduce code duplication enough to be justified? I think it depends on what the tool ends up doing.

ributzka marked 3 inline comments as done.Nov 12 2018, 10:46 AM

I will update the patch with more changes based on the feedback so far.

llvm/include/llvm/TextAPI/Registry.h
37 ↗(On Diff #172200)

The additions to file_magic are defined in a separate patch (https://reviews.llvm.org/D37820). This is how libObject works if you want to use the generic interface. If you look at the original patch that is attached to the RFC you can see how it all ties together.

amontanez added a comment.EditedNov 13 2018, 10:09 AM

I've renamed directories in D53051 from TAPI to TextAPI to match this patch.

ributzka updated this revision to Diff 175291.Nov 26 2018, 10:38 AM
ributzka edited the summary of this revision. (Show Details)
  • Rebase patch on D53051
  • Remove registry and add separate reader and writer classes
  • Move everything into the MachO namespace and sub-folder

Would there be an objection to using llvm::tapi::MachO or llvm::tapi::macho? At least for ELF it doesn't make much sense to put the tapi code in the same namespace as BinaryFormat/ELF so it would be nice to add the extra namespace. I'm not married to the idea however.

davide added a subscriber: davide.Nov 27 2018, 3:08 PM

Nitpicks aside, the mach-o side of this looks good to me. :)

Is there any other outstanding feedback?

llvm/include/llvm/TextAPI/InterfaceFile.h
165 ↗(On Diff #171977)

Looks like a partial comment.

And "writter" should be "writer".

181–184 ↗(On Diff #171977)

Should this be called "addArchitectures" instead, given that it's an additive operation?

llvm/lib/TextAPI/TextStub.cpp
183–191 ↗(On Diff #171977)

You might be able to use llvm/ADT/BitmaskEnum.h here, rather than defining these operators yourself.

394–395 ↗(On Diff #171977)

These sorts are no-ops, along the same lines as the ones Duncan pointed out above.

Oops. Those comments were for and old diff. The no-op sorts are gone, but the other three still apply.

llvm/include/llvm/TextAPI/MachO/InterfaceFile.h
107

Should this be called "addArchitectures" instead, given that it's an additive operation?

173

Looks like a partial comment?

And "writter" should be "writer".

llvm/lib/TextAPI/MachO/TextStub.cpp
189–197

You might be able to use llvm/ADT/BitmaskEnum.h here, rather than defining these operators yourself.

ributzka updated this revision to Diff 175738.Nov 28 2018, 11:48 AM
ributzka marked 3 inline comments as done.
  • rename setArchitectures to addArchitectures
  • fix comments
  • use BitmaskEnum in several places to remove custom static_cast code.
ributzka marked an inline comment as done.Nov 28 2018, 11:49 AM
ributzka added inline comments.
llvm/lib/TextAPI/MachO/TextStub.cpp
189–197

Sweet. I didn't know we had this. This cleaned up the code in several files.

Would there be an objection to using llvm::tapi::MachO or llvm::tapi::macho? At least for ELF it doesn't make much sense to put the tapi code in the same namespace as BinaryFormat/ELF so it would be nice to add the extra namespace. I'm not married to the idea however.

I see and use tapi as a top-level namespace for the tapi tool, so it feels weird to have that namespace also as a nested namespace inside llvm.

lhames accepted this revision.Nov 28 2018, 1:06 PM

Looks good to me. :)

This revision is now accepted and ready to land.Nov 28 2018, 1:06 PM
ributzka closed this revision.Nov 28 2018, 1:36 PM
thakis added a subscriber: thakis.Jan 1 2019, 7:03 PM

Out of interest, why aren't most of the tests for this lit tests? The usual llvm way of testing stuff is to write a small llvm tool using your library (which will be useful outside of a testing context too) and then use that to write lit tests.

thakis added a comment.Jan 6 2019, 8:04 AM

Out of interest, why aren't most of the tests for this lit tests? The usual llvm way of testing stuff is to write a small llvm tool using your library (which will be useful outside of a testing context too) and then use that to write lit tests.

Looks like this is now happening (rL350341). Are most of the unit tests going to move to lit tests now?

Out of interest, why aren't most of the tests for this lit tests? The usual llvm way of testing stuff is to write a small llvm tool using your library (which will be useful outside of a testing context too) and then use that to write lit tests.

Looks like this is now happening (rL350341). Are most of the unit tests going to move to lit tests now?

This is a different YAML format from r350341 but unittest is running as part of the lit as well, right?
A new tool can be created to read MachO YAML file but I want to understand your real concern here.

Out of interest, why aren't most of the tests for this lit tests? The usual llvm way of testing stuff is to write a small llvm tool using your library (which will be useful outside of a testing context too) and then use that to write lit tests.

Looks like this is now happening (rL350341). Are most of the unit tests going to move to lit tests now?

This is a different YAML format from r350341 but unittest is running as part of the lit as well, right?
A new tool can be created to read MachO YAML file but I want to understand your real concern here.

The concerns are:

  • being able to iterate on tests without having to wait for a test binary to link
  • testing consistent with how the rest of llvm works
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2019, 4:18 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
mstorsjo added inline comments.
llvm/include/llvm/TextAPI/MachO/Architecture.def
17

This seemed to break compilation of LLVM for i386.

When targeting i386, both GCC and Clang define i386 (without underscores), see e.g. https://github.com/llvm-project/clang/blob/master/lib/Basic/Targets/X86.cpp#L897.

The fact that i386 is defined and expands to 1 breaks this ARCHINFO case, as 1 isn't a valid name in enum class Architecture.