This is an archive of the discontinued LLVM Phabricator instance.

Object: Extract a ModuleSymbolTable class from IRObjectFile.
ClosedPublic

Authored by pcc on Nov 23 2016, 4:08 PM.

Details

Summary

This class represents a symbol table built from in-memory IR. It provides
access to GlobalValues and should only be used if such access is required
(e.g. in the LTO implementation). We will eventually change IRObjectFile
to read from a bitcode symbol table rather than using ModuleSymbolTable,
so it would not be able to expose the module.

Depends on D26928

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 79165.Nov 23 2016, 4:08 PM
pcc retitled this revision from to Object: Extract a ModuleSymbolTable class from IRObjectFile..
pcc updated this object.
pcc added a reviewer: mehdi_amini.
pcc added subscribers: llvm-commits, rafael.
pcc updated this revision to Diff 79175.Nov 23 2016, 4:22 PM
  • Revert an unintentionally added hunk
mehdi_amini added inline comments.Nov 23 2016, 9:27 PM
llvm/include/llvm/Object/ModuleSymbolTable.h
42 ↗(On Diff #79175)

Could it take a MemoryBufferRef and construct a lazy module that it would own? The fact that a Module is created should be an implementation detail.

That seems closer to the goal of having a symbol table in the bitcode file, and will help designing the client APIs appropriately.

mehdi_amini added inline comments.Nov 23 2016, 9:37 PM
llvm/lib/Object/ModuleSymbolTable.cpp
40 ↗(On Diff #79175)

We should assert that every module added have the same triple as the others.

pcc added inline comments.Nov 28 2016, 11:57 AM
llvm/include/llvm/Object/ModuleSymbolTable.h
42 ↗(On Diff #79175)

So the idea here is that this class would be the low-level interface to be used in the case where you know you're dealing with an actual IR module in memory. That's why the class exposes GlobalValues for example.

When symbol tables are introduced, there would be a separate class that would be the low-level interface to the bitcode symbol table. That class would be used by IRObjectFile and lto::InputFile. lib/LTO would also continue to use ModuleSymbolTable in places like LTO::addRegularLTO where it deals with a loaded module.

I'm sorry if I didn't make this clear enough in the discussion on D26951.

mehdi_amini added inline comments.Nov 28 2016, 12:17 PM
llvm/include/llvm/Object/ModuleSymbolTable.h
42 ↗(On Diff #79175)

If this is the low-level interface, what is the high-level one that abstracts away the difference between the low-level ones?
I wouldn't expect the low-level ones to be used in IRObjectFile or lib/LTO, why is that?

pcc added inline comments.Nov 28 2016, 12:39 PM
llvm/include/llvm/Object/ModuleSymbolTable.h
42 ↗(On Diff #79175)

The next higher level interface would be the one that can provide a "bitcode symbol table" for any module, including modules without symbol tables or with old symbol tables, by creating the symbol table on the fly if necessary. Something like:
Expected<std::unique_ptr<MemoryBuffer>> getSymbolTable(MemoryBufferRef Buf);

where the MemoryBuffer would own the symbol table if it needed to be created, or would point inside Buf if not.

That way, no abstraction is needed between in-memory modules and bitcode symbol tables; clients can always just use a bitcode symbol table.

As long as symbol tables don't exist, we're using ModuleSymbolTable in IRObjectFile and lib/LTO. It should be easy enough to switch both of those classes to using getSymbolTable and the bitcode symbol table reader thanks to the work I've done to hide the Module.

mehdi_amini added inline comments.Nov 28 2016, 12:56 PM
llvm/include/llvm/Object/ModuleSymbolTable.h
42 ↗(On Diff #79175)

About Expected<std::unique_ptr<MemoryBuffer>> getSymbolTable(MemoryBufferRef Buf);, assuming the argument Buf is some bitcode without a symbol table, this function would parse the IR to create a module, create a symbol table out of the loaded IR, and then serialize out the symbol table in bitcode format into a new buffer, and return this buffer?

pcc added inline comments.Nov 28 2016, 12:57 PM
llvm/include/llvm/Object/ModuleSymbolTable.h
42 ↗(On Diff #79175)

Correct.

mehdi_amini edited edge metadata.Nov 30 2016, 8:59 PM

The class BitcodeSymbolTable I extracted in https://reviews.llvm.org/D23132 had:

std::string Triple;
std::string Datalayout;
std::string ModuleAsm;
std::string SourceFilename;

I think Triple/Datalayout/SourceFilename are likely to be useful as part of the bitcode symbol table (and having them now would allow you to add the assertion I mentioned earlier).

For the ModuleASM, it is not clear to me yet: unless we change the IR I think we won't be able to avoid it either.

pcc added a comment.Nov 30 2016, 9:22 PM

Not sure if we'll need accessors. I'd expect that clients which need this information (essentially just the symbol table writer, eventually) would just pull the information out of the module that they presumably have. The assertion seems reasonable though.

I don't think I understand why we'd want all of the module asm. I'd have imagined that we would serialize the AsmSymbols along with the IR symbols. Or is that the IR change you were alluding to?

llvm/lib/Object/ModuleSymbolTable.cpp
40 ↗(On Diff #79175)

Sorry, I missed this, will do.

In D27073#610236, @pcc wrote:

Not sure if we'll need accessors. I'd expect that clients which need this information (essentially just the symbol table writer, eventually) would just pull the information out of the module that they presumably have

The BitcodeModule you mean? Because we're not supposed to have a llvm::Module anymore (the patch I had was removing llvm::Module entirely from LTOModule I believe, and still exposing LTOModule::getTargetTriple.

I don't think I understand why we'd want all of the module asm. I'd have imagined that we would serialize the AsmSymbols along with the IR symbols. Or is that the IR change you were alluding to?

The issue is that we can't make Bitcode serialization dependent on the ASM Parser. So we'd need to change the IR representation for inline ASM. We discussed it here https://llvm.org/bugs/show_bug.cgi?id=28970

pcc added a comment.Nov 30 2016, 9:40 PM
In D27073#610236, @pcc wrote:

Not sure if we'll need accessors. I'd expect that clients which need this information (essentially just the symbol table writer, eventually) would just pull the information out of the module that they presumably have

The BitcodeModule you mean? Because we're not supposed to have a llvm::Module anymore (the patch I had was removing llvm::Module entirely from LTOModule I believe, and still exposing LTOModule::getTargetTriple.

Sorry, I meant that we wouldn't need accessors such as getTargetTriple() on ModuleSymbolTable, because as I mentioned that's the class we'll use for going from IR modules to symbol tables, and if you're starting from an IR module then you obviously already have an IR module that you can call accessors on.

In {LTOModule,lto::InputFile,IRObjectFile} we'd eventually be using something like a BitcodeSymtabReader class on which it seems perfectly reasonable to expose these accessors which could read from the bitcode symbol table.

I don't think I understand why we'd want all of the module asm. I'd have imagined that we would serialize the AsmSymbols along with the IR symbols. Or is that the IR change you were alluding to?

The issue is that we can't make Bitcode serialization dependent on the ASM Parser. So we'd need to change the IR representation for inline ASM. We discussed it here https://llvm.org/bugs/show_bug.cgi?id=28970

Thanks for the pointer, I'll take a look. It looks like this happened while I was on vacation and I must have missed it.

Sorry, I meant that we wouldn't need accessors such as getTargetTriple() on ModuleSymbolTable, because as I mentioned that's the class we'll use for going from IR modules to symbol tables, and if you're starting from an IR module then you obviously already have an IR module that you can call accessors on.

Right, I forgot you plan on another class for the BitcodeSymbolTable.

pcc updated this revision to Diff 79870.Nov 30 2016, 10:48 PM
pcc marked an inline comment as done.
pcc edited edge metadata.
  • Assert that each module added to the symbol table has the same target triple
mehdi_amini accepted this revision.Nov 30 2016, 10:57 PM
mehdi_amini edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Nov 30 2016, 10:57 PM
This revision was automatically updated to reflect the committed changes.