This is an archive of the discontinued LLVM Phabricator instance.

Bitcode: Introduce BitcodeWriter interface.
ClosedPublic

Authored by pcc on Oct 31 2016, 8:14 PM.

Details

Summary

This interface allows clients to write multiple modules to a single
bitcode file. Also introduce the llvm-join utility which can be used
to create a bitcode file containing multiple modules.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 76518.Oct 31 2016, 8:14 PM
pcc retitled this revision from to Bitcode: Introduce BitcodeWriter interface..
pcc updated this object.
pcc added a reviewer: mehdi_amini.
pcc added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Oct 31 2016, 8:22 PM

It'd be nice to add the opposite feature of llvm-join to llvm-extract: being able to list/extract the modules in a single file.

pcc added a comment.Oct 31 2016, 8:25 PM

Yes, that's blocked on an appropriate reader interface though (waiting on a response to my RFC).

pcc added a comment.Oct 31 2016, 8:27 PM

By which I mean "RFC [Bitcode]: Moving block info block state"

dtzWill added a subscriber: dtzWill.Nov 1 2016, 8:48 AM
pcc updated this revision to Diff 78814.Nov 21 2016, 6:03 PM
pcc edited edge metadata.

Refresh

The name llvm-join is quite close from llvm-link, but I don't have a better suggestion right now.

llvm/test/Bitcode/multi-module.ll
5 ↗(On Diff #78814)

Now that we have D26778 you could update this test I think!

llvm-combine, llvm-fuse, llvm-merge are alternative names I can think about in the same idea, but that does not distinguish from llvm-link either.
What about llvm-append?

silvas added a subscriber: silvas.Nov 23 2016, 8:32 PM

Can we just add an llvm-bcutil and add subcommands to it? This could probably subsume llvm-extract and llvm-bcanalyzer someday too (and we could switch on argv[0] to imitate them if necessary). Maybe we could grow one of those two into llvm-bcutil?

Maybe something like
llvm-bcutil cat (or concat as peter suggested)
llvm-bcutil uncat
llvm-bcutil split-module (current llvm-split?)
llvm-bcutil extract-function (current llvm-extract?)
llvm-bcutil extract-module (for D26778/llvm-modextract?)
llvm-bcutil analyze (current llvm-bcanalyzer?)
llvm-bcutil link-modules (current llvm-link?)
...

We don't have to get there overnight, but having a nice place to put these things with low boilerplate would be good IMO. We shouldn't have significant overhead or boilerplate for adding testing utilities or utilities that are handy for us as developers.

In your list, there are some that are IR level tools, it seems strange to me to put them under a "bcutil" umbrella.

In your list, there are some that are IR level tools, it seems strange to me to put them under a "bcutil" umbrella.

I thought about that, but I think the only one that doesn't actually materialize a Module in memory is llvm-bcanalyzer. So in some sense they are all about more than just bitcode. Also, colloquially, "bc file" is often used when IR is meant ("can you send me a bc file of that so I can debug the issue?"; often, textual IR would be fine in those contexts), so I don't think it's that confusing. I agree it's slightly confusing though.

I'd like llvm-modextract and llvm-join (or llvm-concat) to operate without materializing IR. It is blocked on updating the bitcode to not have absolute reference from the beginning of the file.

I'd like llvm-modextract and llvm-join (or llvm-concat) to operate without materializing IR. It is blocked on updating the bitcode to not have absolute reference from the beginning of the file.

oh, nice!

pcc updated this revision to Diff 79499.Nov 28 2016, 6:47 PM
  • Change bitcode format to correspond with r288098
pcc updated this revision to Diff 79502.Nov 28 2016, 6:55 PM
mehdi_amini added inline comments.Nov 28 2016, 7:13 PM
llvm/include/llvm/Bitcode/BitcodeWriter.h
35 ↗(On Diff #79502)

\brief is only useful if there is more than one sentence.

llvm/tools/llvm-cat/llvm-cat.cpp
46 ↗(On Diff #79502)

So, if I followed correctly, we should be able to do now as well:

{
  SmallVector<char, 0> Buffer;
  BitcodeWriter Writer(Buffer);
}
for (std::string InputFilename : InputFilenames) {
  std::unique_ptr<MemoryBuffer> MB =
      ExitOnErr(errorOrToExpected(MemoryBuffer::getFileOrSTDIN(InputFilename)));
  std::vector<BitcodeModule> Mods = ExitOnErr(getBitcodeModuleList(*MB));
  for (auto &BitcodeMod : Mods)  
    Buffer.insert(Buffer.end(), BitcodeMod.getBuffer().begin(), BitcodeMod.getBuffer().end());
}

Can we put this behind an option in llvm-cat? I'm not sure sure it is worth creating llvm-binarycat for this.

(Also it makes me notice that your current implementation wouldn't handle input files that contains multiple modules)

pcc updated this revision to Diff 79505.Nov 28 2016, 8:00 PM
pcc marked an inline comment as done.
  • Documentation tweak; llvm-cat -b considered harmless (for now)
pcc marked an inline comment as done.Nov 28 2016, 8:00 PM
pcc added inline comments.
llvm/tools/llvm-cat/llvm-cat.cpp
46 ↗(On Diff #79502)

Seems reasonable enough for now. Let's split it out into llvm-bcutil binarycat once we get around to that though.

mehdi_amini accepted this revision.Nov 28 2016, 10:50 PM
mehdi_amini edited edge metadata.

LGTM. Thanks!

This revision is now accepted and ready to land.Nov 28 2016, 10:50 PM
This revision was automatically updated to reflect the committed changes.
pcc marked an inline comment as done.