This code now lives in lib/Object. The idea is that it can now be reused by
IRObjectFile among other things.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
It seems odd to me to have the invocation of the module reader as well in irsymtab, which is just a piece of the file. Why not in IRObjectFile.cpp in llvm::object?
The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.
What I don't like is that this is not just parsing the irsymtab structures, but also the modules themselves.
True, but that is only an implementation detail. When we start writing the irsymtab to disk, the modules will only be parsed on the "upgrade" code path.
Ok, I forgot about that. Looking now at the InputFile class, the comments indicate it only holds what is necessary for symbol resolution, so I guess the idea is that the Mods field will go away (except I guess on the upgrade path) when we have a real IR symbol table on disk? But then currently the Mods field is used when we add the module for regular LTO, where we do need the full module.
We will always need references to the actual modules in the InputFile class as part of the implementation (not only for full LTO but for ThinLTO to read the summaries and the modules themselves in the backends). I see this function as returning the conceptual top-level entities in the bitcode file, i.e. the symbol table as well as references to the modules.
Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.
I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.
To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.
I also realised that moving the code to IRObjectFile means that IRObjectFile and irsymtab will need to conspire to use the same producer string to decide whether to upgrade, whereas without the move the logic is isolated to irsymtab. So I am slightly more against the move now. I will upload my change to start writing the irsymtab files to disk, so you can see what I mean.
Can you clarify how this would make the upgrade decision difficult? I.e. I was envisioning having the main entry point to create and return the new "File" struct in IRObjectFile, which would then invoke irsymtab::read to fill in the Symtab and Strtab. In D32061, it looks like the decision to take the upgrade path based on the producer string only involves those two structures. Why would the IRObjectFile need to know about the producer string?
Oh, so you want the code to look like this:
in irsymtab.cpp:
Expected<Reader> readWithoutUpgrading(StringRef Symtab, StringRef Strtab) { // try to read the Symtab if (needs to upgrade) return Reader(); Reader R = {Symtab, Strtab}; return R; }
in irobjectfile.cpp:
Expected<File> read(MemoryBufferRef MBRef) { auto BFC = getBitcodeFileContents(MBRef); Reader R = readWithoutUpgrading(BFC.Symtab, BFC.StrtabForSymtab); if (!R.isValid()) return upgrade(BFC); return File(R); }
I dunno, to me it seems less clear and less maintainable because now people have to look at two files to figure out how upgrading works. But I guess I can tolerate it.
No, but I may be missing something... Why can't there be a single call into irsymtab to a function that takes the BFC, and returns a structure containing the Reader, Strtab, and Symtab to use in the File being created? I.e. using those in the BFC if no upgrade needed, or doing the upgrade if necessary.
in irsymtab.cpp:
Expected<Reader> readWithoutUpgrading(StringRef Symtab, StringRef Strtab) { // try to read the Symtab if (needs to upgrade) return Reader(); Reader R = {Symtab, Strtab}; return R; }in irobjectfile.cpp:
Expected<File> read(MemoryBufferRef MBRef) { auto BFC = getBitcodeFileContents(MBRef); Reader R = readWithoutUpgrading(BFC.Symtab, BFC.StrtabForSymtab); if (!R.isValid()) return upgrade(BFC); return File(R); }I dunno, to me it seems less clear and less maintainable because now people have to look at two files to figure out how upgrading works. But I guess I can tolerate it.
That is basically the same as what irsymtab::read is doing in D32061, right? I think the only difference is that irsymtab::read would be taking a list of BitcodeModules rather than returning a list of BitcodeModules. And we'd be adding a function in IRObjectFile that converts from that interface to the current irsymtab::read interface.
If so, I guess that's also fine with me.
Right - essentially the main interface to the File creation is in IRObjectFile, which handles the creation of the BitcodeModules. To me that seems preferable.
I still don't understand why you think that interface is better, but I have made the changes that you have requested.
llvm/include/llvm/Object/IRObjectFile.h | ||
---|---|---|
67 ↗ | (On Diff #101594) | why have this in the irsymtab namespace and not object? My main motivation for the move here is that irsymtab doesn't seem like the right namespace/layer for having the main entry point reading/building the full bitcode file contents. Or a new namespace (e.g. bitcodefile or bitcodeobject or the like). |
llvm/include/llvm/Object/IRSymtab.h | ||
322 ↗ | (On Diff #101594) | Maybe IRSymtabContents? I.e. this doesn't contain the full contents of the bitcode file. |
328 ↗ | (On Diff #101594) | maybe readIRSymtab? |
llvm/include/llvm/Object/IRObjectFile.h | ||
---|---|---|
67 ↗ | (On Diff #101594) | To me these are still conceptually part of the irsymtab, I moved them here at your request. But I'm not going to pick this particular battle either, so ok, let's move them to llvm::object. Does llvm::object::IRSymtabFile/llvm::object::readIRSymtab sound fine? |
llvm/include/llvm/Object/IRSymtab.h | ||
322 ↗ | (On Diff #101594) | It is already in a namespace called irsymtab. There's no need to repeat that in the name here. That said, I should update the comment to reflect that this just contains the irsymtab contents. |
328 ↗ | (On Diff #101594) | Same. |
llvm/include/llvm/Object/IRObjectFile.h | ||
---|---|---|
67 ↗ | (On Diff #101594) | My belief is that the BitcodeModules are not part of the irsymtab (it seems we disagree on this point though). So I think llvm::object::File and llvm::object::readFile seem better. |
llvm/include/llvm/Object/IRObjectFile.h | ||
---|---|---|
67 ↗ | (On Diff #101594) | Based on our offline discussion, your proposal seems like a reasonable interface. |