Modifies the bitcode reader such that the same logic is used for
both memory buffers and data streams. The incremental parsing
was factored into startParse, continueParse, and finishParse.
All parses (incremental or non-incremental) begin with startParse.
Then zero (or more) calls to continueParse incrementally read more
input, picking up from where the last call left off. finishParse
materializes any additional parts, based on the flags passed to startParse.
Details
Diff Detail
Event Timeline
A couple nits for now, but still trying to read through and understand what's happening in the streaming and lazy cases...
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
285–286 | http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments says "\brief" instead of @brief | |
285–286 | \returns | |
293 | lowercase first letter of function name -- should probably do it for these new functions since you're touching it (but leave existing functions alone?) | |
404 | lowercase first letter for new function (and I guess update the commit message if you do) | |
408 | same | |
414 | extra space in between updateParseState and ( | |
416 | In the review, I've tried to look for where ParseState gets set, and there are various ways to grep for that... one is ParseState = X... another is updateParseState(Y, ...), or updateParseState(Z); Is there any way you can make the number of variations smaller? | |
431 | This is a bit weird to me... you have a bunch of function called "updateParseState" but some variants modify ParseState and some variants don't. | |
2795 | NextUnreadBit is no longer set -- wanted to check if that is okay now (and why)? | |
2863 | NOte -> Note | |
3300 | I don't quite understand how "ShouldMaterializeAll = false" is supposed to work for the streaming case, if this isn't checked until after: while (ParseState < NoMoreInput) { if (std::error_code EC = ContinueParse()) { return EC; } } How do you delay reading until materialize(GV) for streaming? | |
3304 | This used to be early, in "getLazyBitcodeModuleImpl". This looks like it is now happening late in FinishParse after the loop until NoMoreInput. Why is this okay now? Was the early call to "materializeForwardReferencedFunctions" actually extraneous because of the call in materialize(GV), or what? Make sure to check the lazy case with blockaddresses for computed gotos, if there isn't already a unittest for that. | |
4639 | no need for extra space |
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
251 | be consistent about capitalization in comments one line starts with "parsed input" and another line starts with "Parsed input" =) | |
267–268 | Don't need explicit anymore, though that transition was a while back so not really related to this CL. | |
294 | Variable name is different from comment "MaterializeAll" vs "ShouldMaterializeAll" -- make them the same? I see that in the actual definition you're trying to avoid conflicting with the field name... |
- Merge branch 'master' of http://llvm.org/git/llvm into readfac1
- Working version. Save state.
- Cleanup startParse.
- Cleaned up code.
- Fix nit.
- Merge branch 'master' of http://llvm.org/git/llvm into readfac1
DId all changes except adding test that forward block address references get resolved on lazy loads.
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
251 | Done. | |
267–268 | Done. | |
285–286 | Sorry, my fault. I followed the syntax of ConstantPlaceHolder below. Changing to follow coding standards. | |
285–286 | Done. | |
293 | Good point. Fixing. | |
294 | Fixing the names to be consistent. Fixing name conflict by prefixing assignments of field names with "this->". | |
404 | Done. | |
408 | Done. | |
416 | I guess the first part of the problem is that there are 2 notions of parse state:
I also overloaded the return value with this update. Refactoring to do less and be more clear. | |
431 | The refactoring is a bit better now. Hopefully good enough. | |
2863 | Done. | |
3300 | After talking to Derek, I realized what was the issue I was missing. I'l summarize what I understand:
Based on this, I've modified the code to lift the materializeForwardReferencedFunctions into startParse. | |
3304 | I agree that there should be some type of forward reference unit test to verify we can lazy evaluate these forward-referenced block addresses. I also agree that for the use by llvm-dis, my original code worked because it eventually calls materializeAllPermanently. I think I may have been confused about the full expectation of "streamed" (or lazy) was because of this. | |
4639 | Done. |
include/llvm/Support/StreamingMemoryObject.h | ||
---|---|---|
75 ↗ | (On Diff #23431) | Why do you need this? The streamer will return how many bytes were read and can handle a larger request. Also, why does it need to be part of this patch? It looks like this patch has many independent changes in it. |
88 ↗ | (On Diff #23431) | Why the extra logic? If objectsize is known it is the same as BytesRead, no? |
include/llvm/Support/StreamingMemoryObject.h | ||
---|---|---|
75 ↗ | (On Diff #23431) | This was added to handle the case of when one is parsing a wrapped bitcode file. In such cases, you do not need to do another read (which may block until it succeeds). That was the intent of this change. However, a simpler approach would be to allow the extra read, and then not set ObjectSize (below) if already set. I will remove this change, and add the conditional assignment to ObjectSize below. I changed it in this CL because it didn't cause a problem until I fixed that materializing a module (when streaming) didn't actually read all of the bitcode file. When that change was added, tests failed and this issue was exposed. I will remove the changes StreamingMemoryObject.{h,cpp} and put in a separate CL. |
88 ↗ | (On Diff #23431) | No, they aren't necessarily the same. The problem happens when you have a wrapped bitcode file, and was not exposed until I fixed the case that we weren't reading the entire bitcode when materializing lazily. Then, a bunch of test cases failed. When I looked into it, this is what I discovered:
This is the reason I changed this file as I did. |
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
259 | nit: This is usually 0 or 1, but it seems unexpected for this field to be named "NumModulesParsed", and yet have the type be "bool". Rename or change type? | |
431 | Thanks -- this is better. For a while I was also wondering how many places need to be aware of setting the state to ParseError, but I think it's just continueParse() because most/all searching for bit position, etc. goes through that. | |
3178 | Does this need to be cleanupOnError(EC) also? | |
3216 | This covers two states? InsideModule and AtTopLevel? It might be more clear if you list them out so it's clear what the "break" corresponds to (AtTopLevel). Previously, the Stream.JumpToBit(NextUnreadBit); was only needed when InsideModule... is it now needed for AtTopLevel too? | |
3315 | "NoMorInput" -> "NoMoreInput" It could also be that more states >= NoMoreInput are added as code evolves, but not handled here. Can the compiler accept/handle a "static_assert(ParseState < NoMoreInput, "...") to catch what happens if more states are added after NoMoreInput but not handled by this switch? | |
4687 | Is this necessary at this point? Should that already be covered by the " // Iterate over the module, deserializing any functions that are still on disk" loop? | |
4687–4688 | The "promise" comment from "above" is removed now, so you could update this comment. |
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
259 | Good catch. I did meant to use size_t. Fixing. | |
431 | That is correct and was the intent. State updates (and bit positioning) is intentionally now localized to continueParse. The only exception is in ParseModule, which updates the state to state whether it returned without completing. | |
3178 | Yes. Good catch. Fixing. | |
3216 | The jumpToBit is needed because various "materialize" methods may be called between calls to continueParse. By forcing a jumpToBit to happen at all calls to continueParse, we no longer need to know where the materialize methods leave the bitcursor. While I did not see an example of an error caused by interleaved calls to materialize, I was very suspicious that they could occur, and wanted to make sure that this would not happen. Hence, I made sure that continueParse always resets the position to where it left off. I will fix to not use default, so that a corresponding warning will be generated if a new value is added to the enumeration. | |
3315 | Fixed string. Also removed "default" case and made all states explicit. This will force a warning if a new state is added. | |
4674 | Removing the comment about being after a function body. This is no longer true. A call to materializeMetadata would put us some place else in the bitcode file. | |
4687 | In correct bitcode files, you are right. However, if the function doesn't define any function blocks, but (incorrectly) references function block addresses, this code will cause the error to be generated. However, looking at the following instruction, this is checked anyway. Removing. | |
4687–4688 | Done. |
- Merge branch 'master' into readfac1
- Fix issues raised by merge.
- Merge branch 'master' of http://llvm.org/git/llvm into readfac1
- Fix tests to use old-style parser.
Now that the issues with the streaming memory object has been fixed, I have updated this CL for review.
Note that I added a CL flag "-old-lazy-bitcode-parser". This was done to deal with a bug fixed by this CL. That is, in the old code, when you materialized a module, it didn't check if there was any additional data in the bitcode file. The new code fixes this by calling "finishParse". However, there are a couple of (bitcode binary) tests that were generated with this violation. Hence, the flag was added to fix this problem.
I'm willing to remove this flag in either (1) a later review, or (2) in a later revision. However, for this review I made the issues explicit so that the problem can be seen.
Let me know which ones have bugs. They might be easy-ish to reconstruct (especially since I have some additional practice in fiddling with bc files, now (Can't promise to deal with them very quickly, though).
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
300–301 | Why the empty line? | |
312 | http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments Omit \brief if the brief description is just a sentence. (You might want to add the '.' at the end, though) | |
789 | Omit \brief. | |
803 | Omit \brief. | |
3254 | We're already at top level, no? (line 3234) |
- Merge branch 'master' of http://llvm.org/git/llvm into readfac1
- Fix issues raised by filcab.
Fixes based on feedback by Filipe.
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
300–301 | Removed. | |
789 | Done. | |
803 | Done. | |
3254 | I agree that we should be at top level. I also agree that it appears weird that we allow extra (unmathced) EndBlocks. This has been allowed by the bitcode reader/writer for years. I just wasn't willing to make the leap that I should remove this. However, I tried removing it (and making it an error), and no tests failed. Hence, Converting this to an error. | |
4679 | Moved the iterating of functions to before the call to finishParse. This deals with the problems I was having with tests in test/Bitcode/invalid.test (i.e. Inputs/invalid-fwdref-type-mismatch-2.bc and Inputs/invalid-load-ptr-type.bc). These two files had multiple errors (the one they intended which was inside a function body, and the one probably not intended - extraneous stuff at the end of the bitcode file). This removes the need for the command line flag UseOldLazyBitcodeParse, and I have deleted it. |
Hi Karl,
The files came straight from the fuzzer, so it is likely have more than one error. If, in order to support them (where support is: keep the test working and diagnosing what we want), you have to change the code in a convoluted way, I would prefer to change the test.
If the change is minimal and not a problem (doesn't impact legibility or architecture), then keeping the tests as they are is not a problem either. I just want to avoid having worse code just so we don't have to re-do some tests.
Of course, if we started crashing on the tests, that's a problem :-)
Thanks,
Filipe
(Phab butchered the comment in my email. Editing it to get a complete history in Phabricator)
- Fix invalid bitcode tests with more than one error.
- Merge branch 'master' into readfac1
In addition to moving the code back in materializeModule, I fixed three tests. Two of them were fuzz tests where I "truncated" the file to the end of the module block. The other test did not have anything after a bad abbreviation definition, and the code file was incomplete. So I generated a replacement test that was well structured otherwise (i.e. only had that one error in it).
include/llvm/Bitcode/BitstreamReader.h | ||
---|---|---|
328 ↗ | (On Diff #26923) | This code fixes the state of the bit streamer when no more input is found. As a result, method AtEndOfStream now works correctly. |
lib/Bitcode/Reader/BitcodeReader.cpp | ||
3253–3255 | Discovered that the Bitstream::EndBLock was "hiding" a bug int the bitstream reader when processing a data stream. That is, when using a data stream, the size is not set until after the eof is reached. Hence, when Stream.AtEndOfStream() was called above, it would return false even when at the eof. The actual problem was in FillCurWord, which did not set the bit position correctly when there was no more input. The old code worked because the read (at eof) would return zero, and is understood as an end block. By returning success for this value, it would hide this problem. I also improved the error message so that once can see where the reader thought the eof should be, if there is miscellaneous stuff at the end of the bit code file. This makes it easier to know where to cut a test file in such cases. | |
4680 | Now that the eof checking is fixed, I moved this back where it was in an earlier version of this CL. |
Hi Karl,
Really sorry for the delay.
LGTM on my part, as long as you add the test for the error message and do the fix.
Thank you,
Filipe
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
790 | Nit: Put more words on the first line. | |
794 | Nit: If it's for docs, it's probably best to start with an uppercase letter. | |
3258 | Thank you! | |
3262 | errs()? Or StrBuf? |
- Merge branch 'master' of http://llvm.org/git/llvm into readfac1
- Fixes associated with review by Filipe.
Applying the patch locally to take a better look.
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
286–299 | Please commit the pure cleanup bits first: using \ instead of @ | |
3263 | This looks a bit much to be honest. Corrupted files are not that common and it is trivial to set a breakpoint to find the state. | |
3266 | This is always 0 or 1. Use a boolean instead. |
I got the following test failures locally:
LLVM :: Bitcode/invalid.test LLVM :: tools/gold/invalid.ll
This CL has gotten a bit long, and hard to read (to many versions). Moved to a new CL http://reviews.llvm.org/D10518
lib/Bitcode/Reader/BitcodeReader.cpp | ||
---|---|---|
286–299 | I assume this has already been done. The new CL doesn't have suc cases anymore. | |
3263 | Simplified in new CL to do same as before. | |
3266 | This was already fixed in master. |
This CL has gotten a bit long, and hard to read (to many versions). Moved to a new CL http://reviews.llvm.org/D10518
be consistent about capitalization in comments
one line starts with "parsed input" and another line starts with "Parsed input" =)