Page MenuHomePhabricator

[ObjCMetadata] Add support for reading Objective-C metadata
Needs ReviewPublic

Authored by steven_wu on Mar 28 2019, 10:02 AM.

Details

Summary

Add a new library to LLVM that can read Objective-C metadata from
binaries or llvm bitcode generation by clang/swift compiler.

Event Timeline

steven_wu created this revision.Mar 28 2019, 10:02 AM
Herald added a project: Restricted Project. · View Herald Transcript

I am not convinced that there is sufficient abstraction here to handle multiple runtimes without significant code duplication. The Mach-O and LLVM IR parsers both appear to duplicate all knowledge of the runtime's data structures, so if you wanted to add a second runtime you'd then need four classes. I would expect to see two abstraction layers:

  1. Something allowing you to specify a data structure to examine that would load it from some binary (IR, Mach-O, ELF, whatever).
  2. Something built on top of this that then queries specific runtime metadata structures.

In clang, we support two broad families of Objective-C runtime: Apple (Legacy and Modern, with version-specific things for macOS, iOS, and so on) and GNU (including GCC, GNUstep / WinObjC, and ObjFW). These are largely orthogonal to object file formats. For example, the GNUstep runtime is widely used with both ELF and PE/COFF binaries and can be used with Mach-O (though typically only for testing). If we wanted to add support for all of these, I suspect that we'd have a code explosion.

I am not convinced that there is sufficient abstraction here to handle multiple runtimes without significant code duplication. The Mach-O and LLVM IR parsers both appear to duplicate all knowledge of the runtime's data structures, so if you wanted to add a second runtime you'd then need four classes. I would expect to see two abstraction layers:

  1. Something allowing you to specify a data structure to examine that would load it from some binary (IR, Mach-O, ELF, whatever).
  2. Something built on top of this that then queries specific runtime metadata structures.

    In clang, we support two broad families of Objective-C runtime: Apple (Legacy and Modern, with version-specific things for macOS, iOS, and so on) and GNU (including GCC, GNUstep / WinObjC, and ObjFW). These are largely orthogonal to object file formats. For example, the GNUstep runtime is widely used with both ELF and PE/COFF binaries and can be used with Mach-O (though typically only for testing). If we wanted to add support for all of these, I suspect that we'd have a code explosion.

I understand the concern. When I say the library has the abstraction to support multiple runtime and object file format in RFC, I was talking about the interface of the metadata reader. The design choice is not to expose runtime data structure to the library interfaces so there can be a unified interface no matter what the actual object file and runtime is used. The prototype has implemented Apple (both legacy and modern) with Macho and LLVM bitcode object file format. With how different the data structures really are, that aren't much sharing between implementations. There is potential to add another layer of abstraction of object file format underneath to eliminate some boiler plates (and those should probably add to libObject directly). If you have better idea and design, please let me know.

steven_wu updated this revision to Diff 197577.May 1 2019, 9:47 AM

Rebase the patch and did some fixup:

  • Fold objective-c metadata reader into object
  • Update the lisence header
  • clang-format

There is still lots of boiler plates but we can improve that in tree.

Rebase the patch to TOT. Ping again.