This is an archive of the discontinued LLVM Phabricator instance.

Object: Make IRObjectFile own multiple modules and enumerate symbols from all modules.
ClosedPublic

Authored by pcc on Nov 21 2016, 6:54 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 78819.Nov 21 2016, 6:54 PM
pcc retitled this revision from to Object: Make IRObjectFile own multiple modules and enumerate symbols from all modules..
pcc updated this object.
pcc added a reviewer: mehdi_amini.
pcc added a subscriber: llvm-commits.
pcc planned changes to this revision.Nov 22 2016, 3:26 PM

Upon further reflection, I think we want to go in a slightly different direction here:

  • factor the "create a symbol table from a module" code out of IRObjectFile and into a new ModuleSymbolTable class, which would conceptually be responsible for maintaining the mapping between symbol table entries and GlobalValues. This would be similar to what I proposed in D23132, but at least to begin with it would just be a straight refactoring of the post- D26928 IRObjectFile code
  • use that class from IRObjectFile
  • change lib/LTO to use ModuleSymbolTable directly
  • remove getModule, takeModule, getSymbolGV interfaces from IRObjectFile

When we get around to implementing bitcode symbol tables we can:

  • implement the bitcode symbol table writer in terms of ModuleSymbolTable
  • change IRObjectFile to read from the bitcode symbol table instead of using ModuleSymbolTable directly

I will start working on ModuleSymbolTable.

pcc added a comment.Nov 22 2016, 3:32 PM

An important point which I forgot to mention: the symbol table stored by ModuleSymbolTable would correspond to any number of modules (all of the same target triple).

mehdi_amini edited edge metadata.Nov 22 2016, 3:33 PM
In D26951#603439, @pcc wrote:

An important point which I forgot to mention: the symbol table stored by ModuleSymbolTable would correspond to any number of modules (all of the same target triple).

That seems an arbitrary choice, that is only driven by the current use case of splitting vtables for LTO.

pcc added a comment.Nov 22 2016, 3:55 PM
In D26951#603439, @pcc wrote:

An important point which I forgot to mention: the symbol table stored by ModuleSymbolTable would correspond to any number of modules (all of the same target triple).

That seems an arbitrary choice, that is only driven by the current use case of splitting vtables for LTO.

I think in general there are two possible cases:

  1. where by design the client needs to have multiple conceptual "views" into the input file (e.g. fat binaries, CUDA, OpenMP)
  2. where the client has a single "view" and does not care about which symbol is defined in which module (e.g. regular/thin LTO splitting)

The client's use of ModuleSymbolTable (and the rest of the lib/Object interface in general) needs to be driven by that fundamental design decision of where the split lies. So for the fat binary scenario I would see the client creating one ModuleSymbolTable (and one bitcode symbol table) for each architecture, and the IRObjectFile growing a way to choose the architecture (as we do in MachOObjectFile for example).

The client's use of ModuleSymbolTable (and the rest of the lib/Object interface in general) needs to be driven by that fundamental design decision of where the split lies. So for the fat binary scenario I would see the client creating one ModuleSymbolTable (and one bitcode symbol table) for each architecture, and the IRObjectFile growing a way to choose the architecture (as we do in MachOObjectFile for example).

Right, but the "triple" as a discriminator seems arbitrary to me: what about use cases where we ship a "fat" object file containing bitcode for a non-optimized debug build of the module and an optimized one? Or for building with and without options like freestanding? Or with and without the sanitizers?

I'm fine with being pragmatic and making it work for CFI in LTO, I just want to make sure that the API and the design of the ModuleSymbolTable / IRObjectFile relationship does not make too many assumptions about it.

pcc added a comment.Nov 22 2016, 4:13 PM

Right, but the "triple" as a discriminator seems arbitrary to me: what about use cases where we ship a "fat" object file containing bitcode for a non-optimized debug build of the module and an optimized one? Or for building with and without options like freestanding? Or with and without the sanitizers?

I think we're confusing a couple of things here. I am not saying that the triple would be the discriminator; the discriminator could in principle be anything the client wants (and could be chosen at BitcodeWriter time). The reason I mentioned that modules associated with a single ModuleSymbolTable should have the same triple is to ensure that the eventual object files are compatible and that name mangling happens consistently. I am not precluding having multiple ModuleSymbolTables whose modules happen to have the same target triple.

Sure, this part is fine, I think what tickled me was "the IRObjectFile growing a way to choose the architecture", which I read as "you pass in the architecture and it get all the right modules from the bitcode file and get a ModuleSymbolTable for them".

pcc added a comment.Nov 22 2016, 4:19 PM

Sure, this part is fine, I think what tickled me was "the IRObjectFile growing a way to choose the architecture", which I read as "you pass in the architecture and it get all the right modules from the bitcode file and get a ModuleSymbolTable for them".

Right, "architecture" was just an example here, we can make a better decision about what exactly the discriminator should be when the time comes to implement a feature that depends on it.

mehdi_amini accepted this revision.Nov 23 2016, 9:41 PM
mehdi_amini edited edge metadata.

LGTM.

llvm/include/llvm/Object/IRObjectFile.h
31 ↗(On Diff #79182)

I rather avoid the ultra-contraction, what about: Mods

This revision is now accepted and ready to land.Nov 23 2016, 9:41 PM
pcc updated this revision to Diff 79680.Nov 29 2016, 5:29 PM
pcc marked an inline comment as done.
pcc edited edge metadata.
  • Address review comments; pick up tool rename
This revision was automatically updated to reflect the committed changes.