This is an archive of the discontinued LLVM Phabricator instance.

Add test showing error in StreamingMemoryObject.setKnownObjectSize().
ClosedPublic

Authored by kschimpf on Apr 9 2015, 11:00 AM.

Details

Summary

The existing code for method StreamingMemoryObject.fetchToPos does not
respect the corresonding call to setKnownObjectSize(). As a result, it
allows the StreamingMemoryObject to read bytes past the object size.

Diff Detail

Event Timeline

kschimpf updated this revision to Diff 23511.Apr 9 2015, 11:00 AM
kschimpf retitled this revision from to Add test showing error in StreamingMemoryObject.setKnownObjectSize()..
kschimpf updated this object.
kschimpf edited the test plan for this revision. (Show Details)
kschimpf added a subscriber: Unknown Object (MLST).
kschimpf updated this revision to Diff 23517.Apr 9 2015, 11:35 AM

Fix StreamingMemoryObject to respect call to setKnownObjectSize().

rafael edited edge metadata.Apr 12 2015, 4:09 AM

sorry, looks like fab didn't sent the comment before. Trying agin.

include/llvm/Support/StreamingMemoryObject.h
84 ↗(On Diff #23517)

I still don't get it.

The comment says

Returns true if Pos can be read.

Now , assume you start reading a bitcode file over http. You get the header of the wrapper saying the size and ObjectSize is set.
After that your network connection goes down and GetBytes will fail to read more data.

fetchToPos will incorrectly say that Pos can be read, no?

kschimpf updated this revision to Diff 23683.Apr 13 2015, 10:08 AM
kschimpf edited edge metadata.

Fix issues raised in diff 23517.

kschimpf added inline comments.Apr 13 2015, 10:09 AM
include/llvm/Support/StreamingMemoryObject.h
84 ↗(On Diff #23517)

Good point. I didn't deal with that case. I only dealt with the case where BytesRead was larger than ObjectSize.

Fixing so that both cases are handled.

jvoung edited edge metadata.Apr 14 2015, 10:29 AM

Mostly thinking out loud, but a few suggestions in the sea of text too.

include/llvm/Support/StreamingMemoryObject.h
78 ↗(On Diff #23683)

Just thinking out loud here to think through what's going on... it is a bit confusing that this has to deal with an ObjectSize, and the fact that it can be different from BytesRead is just more fun =)

If ObjectSize is known, and BytesRead goes above ObjectSize, doesn't that mean that it already read beyond the ObjectSize from the perspective of the Streamer? The amount to read is just kChunkSize. This will adjust the "position" of the Streamer for its next call to GetBytes... if there is a next call. The Streamer doesn't let you seek backward, so you can no longer make the next GetBytes start at exactly the ObjectSize boundary.

So, why is this okay? My impression is that this is okay because the Streamer is owned by this class, and so it should not be shared, so adjusting the position of Streamer beyond the ObjectSize doesn't matter too much.

Okay, so what does matter?

The bounds determined by BytesRead does matter, since that affects what readBytes() will do. readBytes() does not check ObjectSize... and that's what this extra check fixes. On the other hand readBytes() calls fetchToPos, so it is in a way checking ObjectSize (just tucked away here).

81 ↗(On Diff #23683)

Following up on Rafael's example, why not set ObjectSize = BytesRead here unconditionally? At this point, if ObjectSize != 0, then BytesRead <= ObjectSize.

Otherwise, if you call fetchToPos(Pos) afterwards with BytesRead < Pos < ObjectSize, then EOOReached is true and it will say that Pos is readable (since the check on line 71 is Pos < ObjectSize, instead of Pos < BytesRead).

Maybe that earlier condition on line 71 should just be Pos < BytesRead to be consistent with readBytes using "BytesRead" as the boundary at EOF, instead of ObjectSize as the boundary at EOF. It will also be consistent with the "return Pos < BytesRead;" at the end of this function on line 87.

On the other hand, getExtent() and isValidAddress() use ObjectSize, and it will be inconsistent if the stream is ever truncated to be less than the known ObjectSize, but the callers will just have to rely on what readBytes ultimately says.

unittests/Support/StreamingMemoryObject.cpp
22

How about having a second unittest class which sets a limit on how many GetBytes can be pulled. Something like a SizedNullDataStreamer class (or whatever you want to call it).

Then you can have a tests for:

(a) the size isn't known, and the end of object is solely based on what GetBytes() returns vs the requested len.

(b) the one case Rafael brought up -- when the size is "known", but something unexpected happens like the http connection dies before hitting the known size. It could also be that the wrapper lies about the size (maybe someone is trying to trigger a bug).

kschimpf updated this revision to Diff 23741.Apr 14 2015, 1:00 PM
kschimpf edited edge metadata.

Clean up concepts of fetching bytes from ObjectSize.

Rewrote fetchToPos to only worry about reading, and readBytes to check if input is truncated because ObjectSize is set. This simplifies fetchToPos considerably, and only marginally complicates readBytes.

include/llvm/Support/StreamingMemoryObject.h
78 ↗(On Diff #23683)

I agree that the concepts of ObjectSize, NextBytes, and EndOfStream() are confused by the way this code is written. Separating these concepts out (see comment below), so that things are (hopefully) cleaner.

81 ↗(On Diff #23683)

First off, when ObjectSize != 0, BytesRead <= ObjectSize is only guaranteed to be true because setKnownObjectSize would have changed it if BytesRead > ObjectSize. In general, since we don't control when setKnownObjectSize is called, there is no real association between BytesRead and ObjectSize.

The fundamental problem is that we have two notions of EOF (BytesRead and ObjectSize), and their value is independent of each other.

This CL tries to reduce overhead by making them the same if setKnownObjectSize is called.

Alternatively, we can keep these notions separate. Hence, fetchToPos will always do a read and set NextBytes to the end of the read. However, then we need to modify readBytes to check that the requested bytes doesn't exceed ObjectSize.

Updating the CL to match this concept.

Thanks -- looks okay to me, but wait for Rafael to take a look too since he some earlier questions?

include/llvm/Support/StreamingMemoryObject.h
66 ↗(On Diff #23741)

Can you also add a period at the end of the sentence while you are updating this comment?

lib/Support/StreamingMemoryObject.cpp
91 ↗(On Diff #23741)

Maybe add a brief comment about why ObjectSize can be < BytesRead.

111 ↗(On Diff #23741)

Why not keep the reserve?

Rafael,

Jan has reviewed this CL but wants your final say on accepting (since you had questions earlier). Could you please take a look. Thanks

It would be better for both APIs to see the end of data in the same way.

The code organization actually makes that easy now, no?:

Just replace the

return true;

on line 80 with

return !ObjectSize || Pos < ObjectSize;

Considering all the cases:

  • Asked to read until a valid position and we manage to read it: We will return true since we either found the object size or it has to be larger than pos (since pos is valid)
  • Asked to read too much and we actually manage to do it. ObjectSize must be set for it to be legal to put data after a bitcode file, so we will detect this.
  • Asked to read too much, and the GetBytes fails, the check for eof will return false.
include/llvm/Support/StreamingMemoryObject.h
75 ↗(On Diff #23741)

Why only == 0?

The previous logic looks correct If we ask for 100 bytes but only get 50, we are at the end of the stream.

kschimpf updated this revision to Diff 25053.May 6 2015, 9:06 AM
kschimpf updated this revision to Diff 25054.May 6 2015, 9:08 AM
  • Fix order of methods in StreamingMemoryObject.cpp

Please take a look at Diff 25053. Thanks.

include/llvm/Support/StreamingMemoryObject.h
66 ↗(On Diff #23741)

Done.

75 ↗(On Diff #23741)

I used bytes == 0 following how read works. For the case it is hooked up to a pipe, it may return only some of the requested bytes (not necessarily all). This is done when the pipe doesn't have all the bytes requested. In that case, it returns what it does have (and at least one character). It only returns 0 if you reach the pipe has been closed, and you are at the end of the pipe contents.

This matches what we might expect from a data streamer​. Note that many people misuse read and assume that it will always return the number of requested bytes, unless eof has been reached. It is simply not (always) true. I have been burned too many times and that is why I used bytes == 0.

81 ↗(On Diff #23741)

Fixed return value, as suggested by Rafael.

lib/Support/StreamingMemoryObject.cpp
91 ↗(On Diff #23741)

Done.

111 ↗(On Diff #23741)

Added it back. Also set EOFReached if ObjectSize <= BytesRead.

rafael accepted this revision.May 21 2015, 11:36 AM
rafael edited edge metadata.

LGTM

This revision is now accepted and ready to land.May 21 2015, 11:36 AM
This revision was automatically updated to reflect the committed changes.