This implements multi-module support in IRObjectFile.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Upon further reflection, I think we want to go in a slightly different direction here:
- factor the "create a symbol table from a module" code out of IRObjectFile and into a new ModuleSymbolTable class, which would conceptually be responsible for maintaining the mapping between symbol table entries and GlobalValues. This would be similar to what I proposed in D23132, but at least to begin with it would just be a straight refactoring of the post- D26928 IRObjectFile code
- use that class from IRObjectFile
- change lib/LTO to use ModuleSymbolTable directly
- remove getModule, takeModule, getSymbolGV interfaces from IRObjectFile
When we get around to implementing bitcode symbol tables we can:
- implement the bitcode symbol table writer in terms of ModuleSymbolTable
- change IRObjectFile to read from the bitcode symbol table instead of using ModuleSymbolTable directly
I will start working on ModuleSymbolTable.
An important point which I forgot to mention: the symbol table stored by ModuleSymbolTable would correspond to any number of modules (all of the same target triple).
That seems an arbitrary choice, that is only driven by the current use case of splitting vtables for LTO.
I think in general there are two possible cases:
- where by design the client needs to have multiple conceptual "views" into the input file (e.g. fat binaries, CUDA, OpenMP)
- where the client has a single "view" and does not care about which symbol is defined in which module (e.g. regular/thin LTO splitting)
The client's use of ModuleSymbolTable (and the rest of the lib/Object interface in general) needs to be driven by that fundamental design decision of where the split lies. So for the fat binary scenario I would see the client creating one ModuleSymbolTable (and one bitcode symbol table) for each architecture, and the IRObjectFile growing a way to choose the architecture (as we do in MachOObjectFile for example).
The client's use of ModuleSymbolTable (and the rest of the lib/Object interface in general) needs to be driven by that fundamental design decision of where the split lies. So for the fat binary scenario I would see the client creating one ModuleSymbolTable (and one bitcode symbol table) for each architecture, and the IRObjectFile growing a way to choose the architecture (as we do in MachOObjectFile for example).
Right, but the "triple" as a discriminator seems arbitrary to me: what about use cases where we ship a "fat" object file containing bitcode for a non-optimized debug build of the module and an optimized one? Or for building with and without options like freestanding? Or with and without the sanitizers?
I'm fine with being pragmatic and making it work for CFI in LTO, I just want to make sure that the API and the design of the ModuleSymbolTable / IRObjectFile relationship does not make too many assumptions about it.
I think we're confusing a couple of things here. I am not saying that the triple would be the discriminator; the discriminator could in principle be anything the client wants (and could be chosen at BitcodeWriter time). The reason I mentioned that modules associated with a single ModuleSymbolTable should have the same triple is to ensure that the eventual object files are compatible and that name mangling happens consistently. I am not precluding having multiple ModuleSymbolTables whose modules happen to have the same target triple.
Sure, this part is fine, I think what tickled me was "the IRObjectFile growing a way to choose the architecture", which I read as "you pass in the architecture and it get all the right modules from the bitcode file and get a ModuleSymbolTable for them".
Right, "architecture" was just an example here, we can make a better decision about what exactly the discriminator should be when the time comes to implement a feature that depends on it.
LGTM.
llvm/include/llvm/Object/IRObjectFile.h | ||
---|---|---|
31 ↗ | (On Diff #79182) | I rather avoid the ultra-contraction, what about: Mods |