Page MenuHomePhabricator

Add an (optional) identification block in the bitcode

Authored by mehdi_amini on Oct 12 2015, 12:11 PM.



Processing bitcode from a different LLVM version can lead to unexpected behavior. The LLVM project guarantees autoupdating bitcode from a previous minor revision for the same major, but can't make any promise when reading bitcode generated from a either a non-released LLVM, a vendor toolchain, or a "future" LLVM release.
This patch aims at being more user-friendly and allows a bitcode produce to emit an optional block at the beginning of the bitcode that will contains an opaque string intended to describe the bitcode producer information.
The bitcode reader will dump this information alongside any error it reports.

Diff Detail

Event Timeline

mehdi_amini retitled this revision from to Add an (optional) identification block in the bitcode.
mehdi_amini updated this object.
mehdi_amini added a reviewer: rafael.
mehdi_amini set the repository for this revision to rL LLVM.
mehdi_amini removed rL LLVM as the repository for this revision.

Display the current version along with the producer version on error.

I wonder if this should have two components, one for the epoch and one for the string? I'm worried about the fact that you're using the only "unused" ID here, and it does seem nice to not use two IDs for this kind of data.

vsk added a subscriber: vsk.Oct 13 2015, 1:36 PM

Is it possible to add a test for this, perhaps by forcing bad bitcode into llvm-dis?

$ llvm-as -disable-verify -o - | llvm-dis
define void @f() {
  %1 = alloca i32
  %1 = %1
  ret void
llvm-as: <stdin>:3:8: error: expected instruction opcode

Seems like we could have a check-line here for the message BitcodeReader::error emits.

vsk added a comment.Oct 13 2015, 1:38 PM

Whoops, I think that amounts to just "echo '' | llvm-dis".

The problem is to be able to generate some invalid bitcode that *still contains the identification block*.

I tried with some valid bitcode: echo "randomstring" >> /tmp/bitcode.bc ;
But it is not enough to trigger an error with: llvm-dis /tmp/bitcode.bc


Add the "epoch" field along with the producer string.

mehdi_amini removed a subscriber: chandlerc.
chandlerc edited edge metadata.Oct 14 2015, 1:49 PM

Most of my comments are pretty high-level. I'd imagine Duncan is in a much better position to check the actual bitcode side of it. =]


Stale comment?

Also "Can be used to provide better error messages when we fail to parse a bitcode file." maybe?


Is there a better name here than "identification"? I'm not coming up with one... Any ideas Duncan?


I would start at 0 and define missing == 0. I would also document that property.

I think we should also spell out in a bit more detail that this is rev-ed with the major version of LLVM.

Maybe "... which also accepts N-1."?
Use an anonymous enum rather than a macro?


It also does basic Epoch enforcement...


Can we provide a bit more detail here, much like you do above?

mehdi_amini edited edge metadata.

Update taking Chandler comments into account.

mehdi_amini marked an inline comment as done.

Change the epoch to encode major.minor.patch. The basic compatibility check becomes (for now):


jroelofs added inline comments.


should be:


Fix bug spotted by Jonathan Roelofs

jroelofs added inline comments.Oct 20 2015, 9:11 PM

In irc we talked about how to test this... I'll capture those thoughts here:

9:06:40 PM - jokereph_: jroelofs: thks for spotting it
9:08:57 PM - jroelofs: jokereph_: testcases on that particular behavior is probably a good idea (regardless of which epoch kind you guys end up picking)
9:09:31 PM - jokereph_: jroelofs: any idea on how to write it?
9:10:16 PM - jokereph_: I guess I could write a c++ test for that, but it’s not easy to write a generic bitcode test since we want to test something that affects version X.0 only for instance
9:11:19 PM - jroelofs: what if you pull out the "here's what version I am" part of it, and feed that in to the check
9:12:33 PM - jroelofs: then the unitests test could poke in whatever values it wants for that, and have the check verify against poked-in values in the bitcode itself
9:12:41 PM - jokereph_: jroelofs, and writing a C++ unittests? Not a bitcode reader test right? I requires exposing this function though.
9:13:17 PM * jroelofs honestly isn't all that familiar with how the bitcode stuff works/is tested
9:13:30 PM - jokereph_: jroelofs: we only test coarse grain AFAIK
9:13:45 PM - jokereph_: i.e. llvm-dis for instance
9:14:09 PM - jokereph_: I may be able also to write a C++ unittest that forge a bitcode with a specific version and see what the reader says.
9:14:25 PM - jokereph_: In the end it seems quite complicated for what it is :(
9:14:47 PM - jroelofs: maybe split parseBitcodeVersion into a parser and a verifier
9:14:48 PM - jokereph_: I just tested the “major” manually

10:04:06 PM - jroelofs: jokereph: after thinking about it some more, I think the right thing to do testing-wise is to save off a bitcode file from each time the epoch changes, and then have a lit test that tries to load up each of these files. then check for the expected error message for each file
10:05:32 PM - jroelofs: jokereph: that way if the epoch format gets changed down the line, you've got testcases for all the old formats, and can check that the error code does the right thing for all of them

ovyalov added inline comments.

Android builder is failing because of unsupported std::to_string -
Is it possible to replace this call here with std::ostringstream ?

mehdi_amini added inline comments.Oct 26 2015, 12:53 PM

Sure, but do we know what makes it missing? The log says gcc 4.9.0, what is the C++ library used?
It seems we need to add some documentation here:
(unless I missed something there)

jroelofs added inline comments.Oct 26 2015, 12:57 PM

Twine(Major) would work here.

ovyalov added inline comments.Oct 26 2015, 1:52 PM

Android NDK uses own version of STL that has some missed pieces - for example, in LLDB we emulate to_string in such way:

mehdi_amini added inline comments.Oct 26 2015, 3:29 PM

OK I didn't know about the custom STL (I knew that it used a custom libc).
Two actions seems appropriate to me here:

jroelofs: thanks for the suggestion, I didn't know Twine can do the to_string conversion implicitly. (and thanks ovyalov to have already implemented it).

aizatsky added inline comments.

The revision that was actually submitted is different at least in this location:

The submitted version has a bug and a resulting memory leak.

I'll look into it today, thanks for the report.

mehdi_amini added inline comments.Nov 4 2015, 4:30 PM

Fixed in r252110.

The committed diff was significantly different because this last diff was a proposal for another scheme for the versioning (Major/Minor/Patch instead of Epoch), the committed diff is based on the previous diff for this revision.

mehdi_amini accepted this revision.Nov 4 2015, 4:30 PM
mehdi_amini added a reviewer: mehdi_amini.
This revision is now accepted and ready to land.Nov 4 2015, 4:30 PM
mehdi_amini abandoned this revision.Nov 4 2015, 4:30 PM