This is an archive of the discontinued LLVM Phabricator instance.

[Bitstream] Use alignTo to make code more readable. NFC
ClosedPublic

Authored by craig.topper on Sep 1 2020, 9:41 AM.

Details

Summary

I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently.

My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 1 2020, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2020, 9:41 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper requested review of this revision.Sep 1 2020, 9:41 AM
MaskRay accepted this revision.Sep 1 2020, 10:00 AM

Thanks!

(I forgot alignTo<4>(v) and alignTo(v, a) in the review. Sorry)

This revision is now accepted and ready to land.Sep 1 2020, 10:00 AM

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

The case I fixed is similar. One way to address is to extend that blocklen field to 64bit. imo this does not introduce any back-compatibility issue because 32 is not a fixed width, but VBR.

  1. when an old reader reads a bitcode written by a new writer, it works if blocklen is <= 2^32. Although it gets broken if blocklen is > 2^32, this case it does not work anyway.
  2. when a new reader reads a bitcode written by an old writer, it works fine since blocklen is <= 2^32.

So it is possible to extend it to 64bit.

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

The case I fixed is similar. One way to address is to extend that blocklen field to 64bit. imo this does not introduce any back-compatibility issue because 32 is not a fixed width, but VBR.

  1. when an old reader reads a bitcode written by a new writer, it works if blocklen is <= 2^32. Although it gets broken if blocklen is > 2^32, this case it does not work anyway.
  2. when a new reader reads a bitcode written by an old writer, it works fine since blocklen is <= 2^32.

So it is possible to extend it to 64bit.

The blocklen field in ENTER_SUBBLOCK isn't a VBR from what I could see. Its just a 32 bit value allowing a maximum block size of 16GB. There is a VBR6 to store the size of the blob. That one we could change to use uint64_t to allow blobs larger than 4GB, but we'd still be limited by the 16GB limit.

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

The case I fixed is similar. One way to address is to extend that blocklen field to 64bit. imo this does not introduce any back-compatibility issue because 32 is not a fixed width, but VBR.

  1. when an old reader reads a bitcode written by a new writer, it works if blocklen is <= 2^32. Although it gets broken if blocklen is > 2^32, this case it does not work anyway.
  2. when a new reader reads a bitcode written by an old writer, it works fine since blocklen is <= 2^32.

So it is possible to extend it to 64bit.

The blocklen field in ENTER_SUBBLOCK isn't a VBR from what I could see. Its just a 32 bit value allowing a maximum block size of 16GB. There is a VBR6 to store the size of the blob. That one we could change to use uint64_t to allow blobs larger than 4GB, but we'd still be limited by the 16GB limit.

Right. blocklen isnt VBR... 16GB is the limit if the bitcode format is not upgraded into a new version.