This is an archive of the discontinued LLVM Phabricator instance.

Object: Factor out the code for creating the irsymtab for an arbitrary bitcode file.
ClosedPublic

Authored by pcc on Apr 10 2017, 7:58 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc created this revision.Apr 10 2017, 7:58 PM

It seems odd to me to have the invocation of the module reader as well in irsymtab, which is just a piece of the file. Why not in IRObjectFile.cpp in llvm::object?

pcc added a comment.Apr 11 2017, 2:15 PM

The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.

In D31921#724231, @pcc wrote:

The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.

What I don't like is that this is not just parsing the irsymtab structures, but also the modules themselves.

pcc added a comment.Apr 11 2017, 2:43 PM
In D31921#724231, @pcc wrote:

The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.

What I don't like is that this is not just parsing the irsymtab structures, but also the modules themselves.

True, but that is only an implementation detail. When we start writing the irsymtab to disk, the modules will only be parsed on the "upgrade" code path.

In D31921#724280, @pcc wrote:
In D31921#724231, @pcc wrote:

The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.

What I don't like is that this is not just parsing the irsymtab structures, but also the modules themselves.

True, but that is only an implementation detail. When we start writing the irsymtab to disk, the modules will only be parsed on the "upgrade" code path.

Ok, I forgot about that. Looking now at the InputFile class, the comments indicate it only holds what is necessary for symbol resolution, so I guess the idea is that the Mods field will go away (except I guess on the upgrade path) when we have a real IR symbol table on disk? But then currently the Mods field is used when we add the module for regular LTO, where we do need the full module.

pcc added a comment.Apr 12 2017, 11:54 AM
In D31921#724280, @pcc wrote:
In D31921#724231, @pcc wrote:

The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.

What I don't like is that this is not just parsing the irsymtab structures, but also the modules themselves.

True, but that is only an implementation detail. When we start writing the irsymtab to disk, the modules will only be parsed on the "upgrade" code path.

Ok, I forgot about that. Looking now at the InputFile class, the comments indicate it only holds what is necessary for symbol resolution, so I guess the idea is that the Mods field will go away (except I guess on the upgrade path) when we have a real IR symbol table on disk? But then currently the Mods field is used when we add the module for regular LTO, where we do need the full module.

We will always need references to the actual modules in the InputFile class as part of the implementation (not only for full LTO but for ThinLTO to read the summaries and the modules themselves in the backends). I see this function as returning the conceptual top-level entities in the bitcode file, i.e. the symbol table as well as references to the modules.

In D31921#725184, @pcc wrote:
In D31921#724280, @pcc wrote:
In D31921#724231, @pcc wrote:

The irsymtab interface lives at a lower layer than IRObjectFile, as the irsymtab interface is specifically designed for IR symbol tables as opposed to the generic object file interface exposed by ObjectFile. I see this function as the layer between the raw irsymtab reader interface and clients such as IRObjectFile (and LTO, etc). So it doesn't seem appropriate for this function to live alongside one specific client, and it doesn't really deserve its own file/namespace, so it might as well live alongside irsymtab.

What I don't like is that this is not just parsing the irsymtab structures, but also the modules themselves.

True, but that is only an implementation detail. When we start writing the irsymtab to disk, the modules will only be parsed on the "upgrade" code path.

Ok, I forgot about that. Looking now at the InputFile class, the comments indicate it only holds what is necessary for symbol resolution, so I guess the idea is that the Mods field will go away (except I guess on the upgrade path) when we have a real IR symbol table on disk? But then currently the Mods field is used when we add the module for regular LTO, where we do need the full module.

We will always need references to the actual modules in the InputFile class as part of the implementation (not only for full LTO but for ThinLTO to read the summaries and the modules themselves in the backends). I see this function as returning the conceptual top-level entities in the bitcode file, i.e. the symbol table as well as references to the modules.

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

pcc added a comment.Apr 12 2017, 12:38 PM

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

In D31921#725225, @pcc wrote:

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.

pcc added a comment.Apr 12 2017, 1:20 PM
In D31921#725225, @pcc wrote:

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.

I also realised that moving the code to IRObjectFile means that IRObjectFile and irsymtab will need to conspire to use the same producer string to decide whether to upgrade, whereas without the move the logic is isolated to irsymtab. So I am slightly more against the move now. I will upload my change to start writing the irsymtab files to disk, so you can see what I mean.

pcc added a comment.Apr 13 2017, 4:20 PM

D32061 shows how I expect irsymtab::read to evolve. Let me know if that makes sense.

pcc updated this revision to Diff 96630.Apr 25 2017, 2:01 PM

Refresh

In D31921#725260, @pcc wrote:
In D31921#725225, @pcc wrote:

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.

I also realised that moving the code to IRObjectFile means that IRObjectFile and irsymtab will need to conspire to use the same producer string to decide whether to upgrade, whereas without the move the logic is isolated to irsymtab. So I am slightly more against the move now. I will upload my change to start writing the irsymtab files to disk, so you can see what I mean.

Can you clarify how this would make the upgrade decision difficult? I.e. I was envisioning having the main entry point to create and return the new "File" struct in IRObjectFile, which would then invoke irsymtab::read to fill in the Symtab and Strtab. In D32061, it looks like the decision to take the upgrade path based on the producer string only involves those two structures. Why would the IRObjectFile need to know about the producer string?

pcc added a comment.May 31 2017, 1:55 PM
In D31921#725260, @pcc wrote:
In D31921#725225, @pcc wrote:

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.

I also realised that moving the code to IRObjectFile means that IRObjectFile and irsymtab will need to conspire to use the same producer string to decide whether to upgrade, whereas without the move the logic is isolated to irsymtab. So I am slightly more against the move now. I will upload my change to start writing the irsymtab files to disk, so you can see what I mean.

Can you clarify how this would make the upgrade decision difficult? I.e. I was envisioning having the main entry point to create and return the new "File" struct in IRObjectFile, which would then invoke irsymtab::read to fill in the Symtab and Strtab. In D32061, it looks like the decision to take the upgrade path based on the producer string only involves those two structures. Why would the IRObjectFile need to know about the producer string?

Oh, so you want the code to look like this:

in irsymtab.cpp:

Expected<Reader> readWithoutUpgrading(StringRef Symtab, StringRef Strtab) {
  // try to read the Symtab

  if (needs to upgrade)
    return Reader();

  Reader R = {Symtab, Strtab};
  return R;
}

in irobjectfile.cpp:

Expected<File> read(MemoryBufferRef MBRef) {
  auto BFC = getBitcodeFileContents(MBRef);
  Reader R = readWithoutUpgrading(BFC.Symtab, BFC.StrtabForSymtab);
  if (!R.isValid())
    return upgrade(BFC);
  return File(R);
}

I dunno, to me it seems less clear and less maintainable because now people have to look at two files to figure out how upgrading works. But I guess I can tolerate it.

In D31921#769239, @pcc wrote:
In D31921#725260, @pcc wrote:
In D31921#725225, @pcc wrote:

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.

I also realised that moving the code to IRObjectFile means that IRObjectFile and irsymtab will need to conspire to use the same producer string to decide whether to upgrade, whereas without the move the logic is isolated to irsymtab. So I am slightly more against the move now. I will upload my change to start writing the irsymtab files to disk, so you can see what I mean.

Can you clarify how this would make the upgrade decision difficult? I.e. I was envisioning having the main entry point to create and return the new "File" struct in IRObjectFile, which would then invoke irsymtab::read to fill in the Symtab and Strtab. In D32061, it looks like the decision to take the upgrade path based on the producer string only involves those two structures. Why would the IRObjectFile need to know about the producer string?

Oh, so you want the code to look like this:

No, but I may be missing something... Why can't there be a single call into irsymtab to a function that takes the BFC, and returns a structure containing the Reader, Strtab, and Symtab to use in the File being created? I.e. using those in the BFC if no upgrade needed, or doing the upgrade if necessary.

in irsymtab.cpp:

Expected<Reader> readWithoutUpgrading(StringRef Symtab, StringRef Strtab) {
  // try to read the Symtab

  if (needs to upgrade)
    return Reader();

  Reader R = {Symtab, Strtab};
  return R;
}

in irobjectfile.cpp:

Expected<File> read(MemoryBufferRef MBRef) {
  auto BFC = getBitcodeFileContents(MBRef);
  Reader R = readWithoutUpgrading(BFC.Symtab, BFC.StrtabForSymtab);
  if (!R.isValid())
    return upgrade(BFC);
  return File(R);
}

I dunno, to me it seems less clear and less maintainable because now people have to look at two files to figure out how upgrading works. But I guess I can tolerate it.

pcc added a comment.May 31 2017, 2:37 PM
In D31921#769239, @pcc wrote:
In D31921#725260, @pcc wrote:
In D31921#725225, @pcc wrote:

Ok, but then I am confused about your earlier response that eventually the modules will only be parsed on the "upgrade" code path? If this outlined function will continue to parse all top level entities in the bitcode file, then I am not sure it belongs in irsymtab.

I see the symbol table becoming the top-level entity in the bitcode file in the long term (if we switch to a new bitcode format). But I may be wrong, and I suppose that from that perspective it doesn't matter where the code lives, because it will all be rewritten anyway. So I'm fine with moving it to IRObjectFile if you still think it sohuld go there.

To me conceptually the module also a top level entity, but I guess it depends on how you look at it. I'd prefer it in IRObjectFile over having it within irsymtab.

I also realised that moving the code to IRObjectFile means that IRObjectFile and irsymtab will need to conspire to use the same producer string to decide whether to upgrade, whereas without the move the logic is isolated to irsymtab. So I am slightly more against the move now. I will upload my change to start writing the irsymtab files to disk, so you can see what I mean.

Can you clarify how this would make the upgrade decision difficult? I.e. I was envisioning having the main entry point to create and return the new "File" struct in IRObjectFile, which would then invoke irsymtab::read to fill in the Symtab and Strtab. In D32061, it looks like the decision to take the upgrade path based on the producer string only involves those two structures. Why would the IRObjectFile need to know about the producer string?

Oh, so you want the code to look like this:

No, but I may be missing something... Why can't there be a single call into irsymtab to a function that takes the BFC, and returns a structure containing the Reader, Strtab, and Symtab to use in the File being created? I.e. using those in the BFC if no upgrade needed, or doing the upgrade if necessary.

That is basically the same as what irsymtab::read is doing in D32061, right? I think the only difference is that irsymtab::read would be taking a list of BitcodeModules rather than returning a list of BitcodeModules. And we'd be adding a function in IRObjectFile that converts from that interface to the current irsymtab::read interface.

If so, I guess that's also fine with me.

In D31921#769284, @pcc wrote:

No, but I may be missing something... Why can't there be a single call into irsymtab to a function that takes the BFC, and returns a structure containing the Reader, Strtab, and Symtab to use in the File being created? I.e. using those in the BFC if no upgrade needed, or doing the upgrade if necessary.

That is basically the same as what irsymtab::read is doing in D32061, right? I think the only difference is that irsymtab::read would be taking a list of BitcodeModules rather than returning a list of BitcodeModules. And we'd be adding a function in IRObjectFile that converts from that interface to the current irsymtab::read interface.

Right - essentially the main interface to the File creation is in IRObjectFile, which handles the creation of the BitcodeModules. To me that seems preferable.

pcc updated this revision to Diff 101594.Jun 6 2017, 11:26 AM
  • Address review comments
pcc added a comment.Jun 6 2017, 11:26 AM
In D31921#769284, @pcc wrote:

No, but I may be missing something... Why can't there be a single call into irsymtab to a function that takes the BFC, and returns a structure containing the Reader, Strtab, and Symtab to use in the File being created? I.e. using those in the BFC if no upgrade needed, or doing the upgrade if necessary.

That is basically the same as what irsymtab::read is doing in D32061, right? I think the only difference is that irsymtab::read would be taking a list of BitcodeModules rather than returning a list of BitcodeModules. And we'd be adding a function in IRObjectFile that converts from that interface to the current irsymtab::read interface.

Right - essentially the main interface to the File creation is in IRObjectFile, which handles the creation of the BitcodeModules. To me that seems preferable.

I still don't understand why you think that interface is better, but I have made the changes that you have requested.

tejohnson added inline comments.Jun 7 2017, 12:19 PM
llvm/include/llvm/Object/IRObjectFile.h
67 ↗(On Diff #101594)

why have this in the irsymtab namespace and not object? My main motivation for the move here is that irsymtab doesn't seem like the right namespace/layer for having the main entry point reading/building the full bitcode file contents. Or a new namespace (e.g. bitcodefile or bitcodeobject or the like).

llvm/include/llvm/Object/IRSymtab.h
322 ↗(On Diff #101594)

Maybe IRSymtabContents? I.e. this doesn't contain the full contents of the bitcode file.

328 ↗(On Diff #101594)

maybe readIRSymtab?

pcc added inline comments.Jun 7 2017, 12:50 PM
llvm/include/llvm/Object/IRObjectFile.h
67 ↗(On Diff #101594)

To me these are still conceptually part of the irsymtab, I moved them here at your request. But I'm not going to pick this particular battle either, so ok, let's move them to llvm::object. Does llvm::object::IRSymtabFile/llvm::object::readIRSymtab sound fine?

llvm/include/llvm/Object/IRSymtab.h
322 ↗(On Diff #101594)

It is already in a namespace called irsymtab. There's no need to repeat that in the name here.

That said, I should update the comment to reflect that this just contains the irsymtab contents.

328 ↗(On Diff #101594)

Same.

tejohnson added inline comments.Jun 7 2017, 1:53 PM
llvm/include/llvm/Object/IRObjectFile.h
67 ↗(On Diff #101594)

My belief is that the BitcodeModules are not part of the irsymtab (it seems we disagree on this point though). So I think llvm::object::File and llvm::object::readFile seem better.

tejohnson added inline comments.Jun 7 2017, 2:19 PM
llvm/include/llvm/Object/IRObjectFile.h
67 ↗(On Diff #101594)

Based on our offline discussion, your proposal seems like a reasonable interface.

pcc updated this revision to Diff 101825.Jun 7 2017, 3:14 PM
  • Repaint the bikeshed
tejohnson accepted this revision.Jun 7 2017, 3:31 PM

LGTM, thanks

This revision was automatically updated to reflect the committed changes.