This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Bitstream/Reader/
-
lib/
-
Bitstream/
-
Reader/
-
BitstreamReader.cpp

Differential D86957

[Bitstream] Use alignTo to make code more readable. NFC
ClosedPublic

Authored by craig.topper on Sep 1 2020, 9:41 AM.

Download Raw Diff

Details

Reviewers

MaskRay
stephan.yichao.zhao
void
bkramer

Commits

rG96ae43bad5b8: [Bitstream] Use alignTo to make code more readable. NFC

Summary

I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently.

My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Sep 1 2020, 9:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2020, 9:41 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

craig.topper requested review of this revision.Sep 1 2020, 9:41 AM

Thanks!

(I forgot alignTo<4>(v) and alignTo(v, a) in the review. Sorry)

This revision is now accepted and ready to land.Sep 1 2020, 10:00 AM

Closed by commit rG96ae43bad5b8: [Bitstream] Use alignTo to make code more readable. NFC (authored by craig.topper). · Explain WhySep 1 2020, 11:07 AM

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG96ae43bad5b8: [Bitstream] Use alignTo to make code more readable. NFC.

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

In D86957#2250193, @stephan.yichao.zhao wrote:

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

In D86957#2250225, @craig.topper wrote:

In D86957#2250193, @stephan.yichao.zhao wrote:

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

The case I fixed is similar. One way to address is to extend that blocklen field to 64bit. imo this does not introduce any back-compatibility issue because 32 is not a fixed width, but VBR.

when an old reader reads a bitcode written by a new writer, it works if blocklen is <= 2^32. Although it gets broken if blocklen is > 2^32, this case it does not work anyway.
when a new reader reads a bitcode written by an old writer, it works fine since blocklen is <= 2^32.

So it is possible to extend it to 64bit.

In D86957#2250245, @stephan.yichao.zhao wrote:

In D86957#2250225, @craig.topper wrote:

In D86957#2250193, @stephan.yichao.zhao wrote:

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

The case I fixed is similar. One way to address is to extend that blocklen field to 64bit. imo this does not introduce any back-compatibility issue because 32 is not a fixed width, but VBR.

when an old reader reads a bitcode written by a new writer, it works if blocklen is <= 2^32. Although it gets broken if blocklen is > 2^32, this case it does not work anyway.

when a new reader reads a bitcode written by an old writer, it works fine since blocklen is <= 2^32.

So it is possible to extend it to 64bit.

The blocklen field in ENTER_SUBBLOCK isn't a VBR from what I could see. Its just a 32 bit value allowing a maximum block size of 16GB. There is a VBR6 to store the size of the blob. That one we could change to use uint64_t to allow blobs larger than 4GB, but we'd still be limited by the 16GB limit.

In D86957#2250270, @craig.topper wrote:

In D86957#2250245, @stephan.yichao.zhao wrote:

In D86957#2250225, @craig.topper wrote:

In D86957#2250193, @stephan.yichao.zhao wrote:

So this does both 64bit cast and alignment. Thank you.

Hi Craig, in your case, will NumElts be actually larger than 2^32? NumElts is read from https://llvm.org/docs/BitCodeFormat.html#enter-subblock-encoding that defines the blocklen to be 32bit. So when it gets larger, the overflow can happen at the writer side (https://llvm.org/doxygen/BitstreamWriter_8h_source.html#l00384).

My specific case was a blob for metadata strings that was ~1GB in size. The multipy by 8 to convert its size to bits was overflowing. I do worry that it might break again if the blob of metadata strings exceed 4GB.

The case I fixed is similar. One way to address is to extend that blocklen field to 64bit. imo this does not introduce any back-compatibility issue because 32 is not a fixed width, but VBR.

when an old reader reads a bitcode written by a new writer, it works if blocklen is <= 2^32. Although it gets broken if blocklen is > 2^32, this case it does not work anyway.

when a new reader reads a bitcode written by an old writer, it works fine since blocklen is <= 2^32.

So it is possible to extend it to 64bit.

The blocklen field in ENTER_SUBBLOCK isn't a VBR from what I could see. Its just a 32 bit value allowing a maximum block size of 16GB. There is a VBR6 to store the size of the blob. That one we could change to use uint64_t to allow blobs larger than 4GB, but we'd still be limited by the 16GB limit.

Right. blocklen isnt VBR... 16GB is the limit if the bitcode format is not upgraded into a new version.

Revision Contents

Path

Size

llvm/

lib/

Bitstream/

Reader/

BitstreamReader.cpp

6 lines

Diff 289226

llvm/lib/Bitstream/Reader/BitstreamReader.cpp

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	for (unsigned i = 1, e = Abbv->getNumOperandInfos(); i < e; ++i) {
// Blob case. Read the number of bytes as a vbr6.		// Blob case. Read the number of bytes as a vbr6.
Expected<uint32_t> MaybeNum = ReadVBR(6);		Expected<uint32_t> MaybeNum = ReadVBR(6);
if (!MaybeNum)		if (!MaybeNum)
return MaybeNum.takeError();		return MaybeNum.takeError();
unsigned NumElts = MaybeNum.get();		unsigned NumElts = MaybeNum.get();
SkipToFourByteBoundary(); // 32-bit alignment		SkipToFourByteBoundary(); // 32-bit alignment

// Figure out where the end of this blob will be including tail padding.		// Figure out where the end of this blob will be including tail padding.
const size_t NewEnd =		const size_t NewEnd = GetCurrentBitNo() + alignTo(NumElts, 4) * 8;
GetCurrentBitNo() + ((static_cast<uint64_t>(NumElts) + 3) & ~3) * 8;

// If this would read off the end of the bitcode file, just set the		// If this would read off the end of the bitcode file, just set the
// record to empty and return.		// record to empty and return.
if (!canSkipToPos(NewEnd/8)) {		if (!canSkipToPos(NewEnd/8)) {
skipToEnd();		skipToEnd();
break;		break;
}		}

▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	for (unsigned i = 1, e = Abbv->getNumOperandInfos(); i != e; ++i) {
Expected<uint32_t> MaybeNumElts = ReadVBR(6);		Expected<uint32_t> MaybeNumElts = ReadVBR(6);
if (!MaybeNumElts)		if (!MaybeNumElts)
return MaybeNumElts.takeError();		return MaybeNumElts.takeError();
uint32_t NumElts = MaybeNumElts.get();		uint32_t NumElts = MaybeNumElts.get();
SkipToFourByteBoundary(); // 32-bit alignment		SkipToFourByteBoundary(); // 32-bit alignment

// Figure out where the end of this blob will be including tail padding.		// Figure out where the end of this blob will be including tail padding.
size_t CurBitPos = GetCurrentBitNo();		size_t CurBitPos = GetCurrentBitNo();
const size_t NewEnd =		const size_t NewEnd = CurBitPos + alignTo(NumElts, 4) * 8;
CurBitPos + ((static_cast<uint64_t>(NumElts) + 3) & ~3) * 8;

// If this would read off the end of the bitcode file, just set the		// If this would read off the end of the bitcode file, just set the
// record to empty and return.		// record to empty and return.
if (!canSkipToPos(NewEnd/8)) {		if (!canSkipToPos(NewEnd/8)) {
Vals.append(NumElts, 0);		Vals.append(NumElts, 0);
skipToEnd();		skipToEnd();
break;		break;
}		}
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines