This is an archive of the discontinued LLVM Phabricator instance.

BitStream reader: propagate errors
ClosedPublic

Authored by jfb on Jun 18 2019, 2:53 PM.

Details

Summary

The bitstream reader handles errors poorly. This has two effects:

  • Bugs in file handling (especially modules) manifest as an "unexpected end of file" crash
  • Users of clang as a library end up aborting because the code unconditionally calls report_fatal_error

The bitstream reader should be more resilient and return Expected / Error as
soon as an error is encountered, not way late like it does now. This patch
starts doing so and adopting the error handling where I think it makes sense.
There's plenty more to do: this patch propagates errors to be minimally useful,
and follow-ups will propagate them further and improve diagnostics.

https://bugs.llvm.org/show_bug.cgi?id=42311
rdar://problem/33159405

Event Timeline

jfb created this revision.Jun 18 2019, 2:53 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 18 2019, 2:53 PM
jfb added a comment.EditedJun 18 2019, 2:54 PM

This is still a work in progress. I still have to debug clang tests that fail, and compile / fix more than clang. Also, clang-format.

Hi JF. Thanks for working on this, nice improvement to error handling!

The overall approach is pretty solid and should prevent a lot of red herring while investigating hard to reproduce crashes in clang, specially when implicit clang modules is involved. Dropping the errors on the floor for previous code that didn't handle errors at all is a fair tradeoff for introducing the functionality. I have omitted any format comments but I noticed several of 80 cols violations. More specific reviews inline.

clang/lib/Frontend/PrecompiledPreamble.cpp
356

Changes like this are so much better!

clang/lib/Frontend/SerializedDiagnosticReader.cpp
67

Can this be simplified as below?

Expected<unsigned> Res = Stream.ReadCode();
if (!Res || Res.get() != llvm::bitc::ENTER_SUBBLOCK)
  return SDError::InvalidDiagnostics; // FIXME propagate the error details.
clang/lib/Serialization/ASTReader.cpp
1158–1170

Not necessarily needed as part of this patch, but I wonder how many of repetitive access patterns (readCode + readRecord, and maybe other patterns) we have that would take advantage of refactoring all these checks out into their own methods.

4335

Very similar to SerializedDiagnosticReader.cpp:44, does it deserve a common helper?

4620

This is a good example of a real issue this patch solves. Sometimes we get signature mismatch problems in implicit modules builds because we read garbage. Having this check and failure here prevents the misleading error message.

clang/lib/Serialization/ASTReaderDecl.cpp
3700

typo on readung

4043

Why? What's a better alternative?

4135

Does this builds fine without assertions?

clang/lib/Serialization/GlobalModuleIndex.cpp
133

Maybe use a similar helper for error checking added in ASTReaderDecl.cpp?

268–275

Very similar to SerializedDiagnosticReader.cpp:44, does it deserve a common helper?

541–547

Very similar to SerializedDiagnosticReader.cpp:44, does it deserve a common helper?

llvm/include/llvm/Bitcode/BitstreamReader.h
234

Alternatively, you can go early return mode for this and other error checking in BitstreamReader.h

Expected<unsigned> Res = Read(NumBits);
if (!Res)
  return Res;
Piece = Res.get();
jfb updated this revision to Diff 205677.Jun 19 2019, 2:37 PM
jfb marked 21 inline comments as done.
  • Address Bruno's comments.
jfb added a comment.Jun 19 2019, 2:38 PM

Hi JF. Thanks for working on this, nice improvement to error handling!

The overall approach is pretty solid and should prevent a lot of red herring while investigating hard to reproduce crashes in clang, specially when implicit clang modules is involved. Dropping the errors on the floor for previous code that didn't handle errors at all is a fair tradeoff for introducing the functionality.

Thanks!

I have omitted any format comments but I noticed several of 80 cols violations. More specific reviews inline.

Yeah I've been avoiding clang-format because it's sometimes easier to manage merge conflicts without reformatting. I'll absolutely run format when the patch is further along.

clang/lib/Frontend/SerializedDiagnosticReader.cpp
67

Yeah, but I think we'll want to propagate errors in a follow-up so we'll end up re-separating them. I'd rather have the right structure here.

clang/lib/Serialization/ASTReader.cpp
1158–1170

Yeah I noticed that too, probably worth adding a helper for. I'll note it in a bug: https://bugs.llvm.org/show_bug.cgi?id=42335

4335

It would be nice, but they don't have the same error handling :(
I'll add a FIXME.

4620

🎉

clang/lib/Serialization/ASTReaderDecl.cpp
3700

lol dung

4043

Anything that can be used as a clang API needs to return errors instead of using report_fatal_error, so that the API can't just abort your process. We should propagate this error, but it's getting tedious here and I think better in a follow-up.

4135

It should, why? The variable is used on both sides of the if.

clang/lib/Serialization/GlobalModuleIndex.cpp
268–275

Left a FIXME in ASTReader.cpp for this.

llvm/include/llvm/Bitcode/BitstreamReader.h
234

Yeah that's the first code I fixed, and later moved to what you suggest. Updated here.

jfb updated this revision to Diff 205861.Jun 20 2019, 10:55 AM
  • Bitstream reader's SkipBLock and EnterSubBlock now return Error instead of Expected<bool>.
jfb updated this revision to Diff 205905.Jun 20 2019, 2:45 PM
  • Handle error in ASTReader::loadGlobalIndex, it's hit in testing.
  • Handle error in CompilerInstance::ExecuteAction, it's hit in testing.
jfb updated this revision to Diff 205928.Jun 20 2019, 5:19 PM
  • Fix a few more places that need to consume errors for now.
jfb updated this revision to Diff 206269.Jun 24 2019, 11:18 AM
  • Remove on error propagation, which a test relies on. We should fix it in a separate patch.
  • pch-from-libclang.c isn't cleaning its temp files before running, which makes reproducing issues that much more painful.
jfb added a comment.Jun 24 2019, 11:18 AM

check-clang now passes all tests, so the patch is pretty much ready to review. I'll get started on the other parts of LLVM that use this API.

jfb updated this revision to Diff 206290.Jun 24 2019, 1:21 PM
  • Update BitstreamReaderTest.cpp to use the updated API.
jfb updated this revision to Diff 206292.Jun 24 2019, 1:32 PM
  • Update BitstreamReaderTest.cpp to use the updated API (now passing all its tests).
jfb updated this revision to Diff 206294.Jun 24 2019, 1:37 PM
  • Update expected error output for invalid bitcode in Bitcode/invalid.test
jfb updated this revision to Diff 206298.Jun 24 2019, 2:14 PM
jfb added a comment.Jun 24 2019, 2:14 PM
  • Ignore another error for now, this one is probably a sign that there's an underlying bug somewhere.
jfb updated this revision to Diff 206299.Jun 24 2019, 2:22 PM
  • Update error message string in tools/llvm-lto/error.ll
jfb added a comment.Jun 24 2019, 2:41 PM

All of check-llvm now passes. I'll look at other users of this stuff in the monorepo, compile and run their tests.

jfb updated this revision to Diff 206310.Jun 24 2019, 3:21 PM
  • Fix clang-tools-extra, check-clang-tools now passes.
jfb updated this revision to Diff 206335.Jun 24 2019, 5:09 PM
  • Fix a few more uses of APIs in clang-tools-extra. All of the monorepo now builds.
jfb added a subscriber: lhames.Jun 24 2019, 5:09 PM
jfb retitled this revision from WIP BitStream reader: propagate errors to BitStream reader: propagate errors.Jun 24 2019, 9:42 PM
jfb added a comment.Jun 24 2019, 9:44 PM

I did a build of all LLVM monorepo projects, and only LLVM / clang / clang-tools-extra are impacted. I therefore ran tests as follows, and they all pass:

rm -rf debug ; mkdir debug && (cd debug && cmake -G Ninja ../llvm -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" && ninja check-all )

I haven't run clang-format yet so ignore formatting. Otherwise, this patch is ready for review and getting committed.

jfb updated this revision to Diff 206483.Jun 25 2019, 10:34 AM
  • Rebase
jfb updated this revision to Diff 206520.Jun 25 2019, 1:14 PM
  • Remove attribute deprecated, it was there to help catch API uses that needed updating.
bruno accepted this revision.Jun 25 2019, 1:17 PM

LGTM

This revision is now accepted and ready to land.Jun 25 2019, 1:17 PM
jfb updated this revision to Diff 206529.Jun 25 2019, 1:56 PM
  • clang-format all the code. If it's ugly, don't complain to me
thegameg added inline comments.Jun 25 2019, 2:03 PM
llvm/include/llvm/Bitcode/BitstreamReader.h
445–446

llvm:: seems unnecessary here.

494

Any reason why this doesn't return Error?

llvm/lib/Bitcode/Reader/BitstreamReader.cpp
149–150

return createStringError here too?

Bigcheese added inline comments.Jun 25 2019, 2:17 PM
clang-tools-extra/clang-doc/BitcodeReader.cpp
527–530

This is a pretty big behavior change. Is clang-doc expecting to get files with unknown blocks?

611–617

Does consumeError return? MaybeCode.get() will crash if !MaybeCode.

I haven't had a chance to audit the whole patch yet, but in general the error suppression idioms are unsafe (though maybe no more so than the existing code?).

I would be inclined to audit all those FIXMEs and replace them with cantFails or consumeErrors. consumeError will generally match existing behavior in places where you were ignoring errors. cantFail will get you aggressive crashes, which can be handy for debugging but not so fun in release code.

Also, if this patch passes the regression tests we need more failure tests. :)

clang-tools-extra/clang-doc/BitcodeReader.cpp
527–530

The inner test here is unsafe, as it will discard the outer error. You need:

if (llvm::Error Err = readSubBlock(BlockOrCode, I)) {
  if (llvm::Error Skipped = Stream.SkipBlock()) {
    consumeError(std::move(Err));
    return Skipped;
  }
  return Err;
}

or:

if (llvm::Error Err = readSubBlock(BlockOrCode, I)) {
  if (llvm::Error Skipped = Stream.SkipBlock()) {
    return joinErrors(std::move(Err), std::move(Skipped));
  return Err;
}
611–617

Yes: consumeError returns. This should probably be:

if (!MaybeCode)
  return consumeError(MaybeCode.takeError()), Cursor::Badness;
clang/lib/Frontend/CompilerInstance.cpp
2051–2053

This will crash if writeIndex ever generates an error. For that reason I would suggest writing this as:

// FIXME: Can actually fail! <fixme goes here>
cantFail(GlobalModuleIndex::writeIndex(...));

It has the same effect, but without the control flow.

2085

Another cantFail candidate.

clang/lib/Frontend/FrontendAction.cpp
945–948

I'd suggest cantFail here, if not for that fixme comment. Yikes.

clang/lib/Frontend/Rewrite/FrontendActions.cpp
132–137

You need a consumeError here or you'll get a runtime crash if an error is generated.

clang/lib/Frontend/SerializedDiagnosticReader.cpp
101–103

This needs a consumeError or it will crash if an error is generated. How about:

if (llvm::Error Err = Stream.SkipBlock())

return consumeError(std::move(Err)), SDError::MalformedTopLevelBlock;
164–166

Needs a consumeError.

182–184

Needs a consumeError.

213–215

Needs a consumeError.

236–238

Needs a consumeError.

clang/lib/Frontend/TestModuleFileExtension.cpp
53–54

Needs a consumeError or cantFail.

Fun fact: cantFail will auto-unwrap Expecteds:

Expected<T> foo();

T v = cantFail(foo());
clang/lib/Serialization/ASTReader.cpp
1620

Unsafe on error. Needs cantFail or consumeError.

1629

Unsafe on error. Needs cantFail or consumeError.

1638

Unsafe on error. Needs cantFail or consumeError.

1645

Unsafe on error. Needs cantFail or consumeError.

1673

Unsafe on error. Needs cantFail or consumeError.

2214

Unsafe on error. Needs cantFail or consumeError.

2225

Unsafe on error. Needs cantFail or consumeError.

2258

Unsafe on error. Needs cantFail or consumeError.

2410

Unsafe on error. Needs cantFail or consumeError.

2421

Unsafe on error. Needs cantFail or consumeError.

2444

Unsafe on error. Needs cantFail or consumeError.

jfb marked 8 inline comments as done.Jun 25 2019, 4:18 PM
jfb added inline comments.
clang-tools-extra/clang-doc/BitcodeReader.cpp
527–530

I don't think it expects this, and it looks (from the above code) like the intent is to handle errors when parsing (so this would be a bug that we'd want fixed).

611–617

Good point, fixed.

llvm/include/llvm/Bitcode/BitstreamReader.h
494

I'm not sure it's really an error: it goes to the end of the block either because it's empty, or because it pops the scope (which doesn't error either). It might be erroneously used, but I'm not sure we should make it an error right now. WDYT?

jfb updated this revision to Diff 206546.Jun 25 2019, 4:18 PM
jfb marked an inline comment as done.
  • Address Francis' and Michael's comments.
jfb added a comment.Jun 25 2019, 4:20 PM

I haven't had a chance to audit the whole patch yet, but in general the error suppression idioms are unsafe (though maybe no more so than the existing code?).

I would be inclined to audit all those FIXMEs and replace them with cantFails or consumeErrors. consumeError will generally match existing behavior in places where you were ignoring errors. cantFail will get you aggressive crashes, which can be handy for debugging but not so fun in release code.

I haven't addressed the comments yet, but wanted to respond: yes, it's unsafe and I was wondering what folks would rather see, so thanks for bringing it up! I indeed only consumed errors that we encountered, and in that sense the code isn't bug-compatible with what we had before. Seems like you'd want consumeError in most places, which I can certainly do.

Also, if this patch passes the regression tests we need more failure tests. :)

Indeed! That's a pretty terrifying thing... but I'm not signing up to address *that* particular issue :)

thegameg added inline comments.Jun 25 2019, 4:41 PM
llvm/include/llvm/Bitcode/BitstreamReader.h
494

Yeah, I'm not sure either... BitstreamCursor::advance seems to return BitstreamEntry::getError(); if this function fails after it encountered an END_BLOCK, and the other users seem to return things like SDError::InvalidDiagnostics or Cursor::BadBlock.

I guess for now, it's fine as it is.

jfb updated this revision to Diff 206551.Jun 25 2019, 5:05 PM
jfb marked 25 inline comments as done.
  • Address Lang's comments.
  • format
clang-tools-extra/clang-doc/BitcodeReader.cpp
611–617

You tempt me so with this comma operator... but no, I must not!

clang/lib/Frontend/CompilerInstance.cpp
2051–2053

I think this one shouldn't crash, so I've made it consume the error instead (and it still needs to get fixed).

jfb updated this revision to Diff 206552.Jun 25 2019, 5:09 PM
  • Improve one fatal error message.
jfb added a comment.Jun 25 2019, 5:11 PM

I've addressed Lang's comments and the re-audited all the new FIXME instances. This patch now actually drops errors on the floor instead of implicitly erroring out unless the error path is tested (which is what I had before). This hides bugs, but I left FIXMEs everywhere and it means the patch will be less disruptive because it's bug-compatible in the areas that aren't tested.

lhames accepted this revision.Jun 26 2019, 9:16 AM

Indeed! That's a pretty terrifying thing... but I'm not signing up to address *that* particular issue :)

Yep. I don't think you need to address that in this patch.

LGTM!

clang-tools-extra/clang-doc/BitcodeReader.cpp
611–617

;)

jfb marked 2 inline comments as done.Jun 26 2019, 9:31 AM
jfb added a comment.Jun 26 2019, 10:08 AM

I think this is ready to go! I rebased and ran check-all for LLVM / clang / clang-tools-extras and everything passes.

This revision was automatically updated to reflect the committed changes.
abrachet added inline comments.
llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
205 ↗(On Diff #206727)

This and an identical switch on line 5367 cause an unused variable warning from this commit. I don't know if the build bots report on this, or the proper way to tell you about this but hopefully you will see it :)

jfb marked 2 inline comments as done.Jun 27 2019, 9:25 AM
jfb added a subscriber: hans.
jfb added inline comments.
llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
205 ↗(On Diff #206727)

Thanks, looks like @hans fixed it in r364505. Odd that it wasn't warning locally for me.

bjope added a subscriber: bjope.Jul 5 2019, 8:26 AM
bjope added inline comments.
clang-tools-extra/trunk/clang-doc/BitcodeReader.cpp
619 ↗(On Diff #206727)

This has caused problem for our sanitizer tests the last couple of days:

FAIL: Extra Tools Unit Tests :: clang-doc/./ClangDocTests/BitcodeTest.emitMethodInfoBitcode (20077 of 48746)
******************** TEST 'Extra Tools Unit Tests :: clang-doc/./ClangDocTests/BitcodeTest.emitMethodInfoBitcode' FAILED ********************
Note: Google Test filter = BitcodeTest.emitMethodInfoBitcode
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from BitcodeTest
[ RUN      ] BitcodeTest.emitMethodInfoBitcode
/local/repo/bbiswjenk/fem023-eiffel003/workspace/llvm/llvm-master-sanitize/clang-tools-extra/clang-doc/BitcodeReader.cpp:621:13: runtime error: load of value 9, which is not a valid value for type 'llvm::bitc::FixedAbbrevIDs'

This was seen when building trunk with clang 6.0.0 and LVM_USE_SANITIZER=Undefined

cmake -G Ninja '-DLLVM_ENABLE_PROJECTS='\''clang;clang-tools-extra'\''' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON '-DCMAKE_CXX_FLAGS='\''-stdlib=libc++'\''' -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_LINKER=gold -DLLVM_USE_SANITIZER=Undefined ../.
-- The C compiler identification is Clang 6.0.0
-- The CXX compiler identification is Clang 6.0.0

Afaict we can't cast the read value to FixedAbbrevIDs as we do not know yet if it matches one of the values defined in the enum, or if we will take the default case.

A similar switch exist at cfe/trunk/lib/Frontend/SerializedDiagnosticReader.cpp:127, but it is using a slightly different pattern:

unsigned Code;
Code = Res.get();
switch ((llvm::bitc::FixedAbbrevIDs)Code)

I haven't seen any failures for SerializedDiagnosticReader. So either we lack test coverage for that function, or the sanitizer only introduce the check when using the static_cast (and declaring Code as an enum) as done here.

jfb marked 2 inline comments as done.Jul 5 2019, 11:02 AM
jfb added inline comments.
clang-tools-extra/trunk/clang-doc/BitcodeReader.cpp
619 ↗(On Diff #206727)

That sounds like a pre-existing bug. We should check that the value is in range before casting. Can you send patches to fix both code locations, and add test coverage? This code is indeed poorly tested.

Why do the sanitizers catch static_cast but not C-style casts?

jfb marked an inline comment as done.Jul 5 2019, 11:03 AM
jfb added inline comments.
clang-tools-extra/trunk/clang-doc/BitcodeReader.cpp
619 ↗(On Diff #206727)

To be clear: relying on the default case is still UB because there's a cast to the enum type before it occurs.

bjope added inline comments.Jul 5 2019, 1:11 PM
clang-tools-extra/trunk/clang-doc/BitcodeReader.cpp
619 ↗(On Diff #206727)

I made a patch here (assuming the goal would be to keep the cast to the enum, and to let the switch cover all enum values): https://reviews.llvm.org/D64262