This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Sanity-check array sizes read from disk before allocating them.
ClosedPublic

Authored by sammccall on Nov 11 2020, 6:54 AM.

Details

Summary

Previously a corrupted index shard could cause us to resize arrays to an
arbitrary int32. This tends to be a huge number, and can render the
system unresponsive.

Instead, cap this at the amount of data that might reasonably be read
(e.g. the #bytes in the file). If the specified length is more than that,
assume the data is corrupt.

Diff Detail

Event Timeline

sammccall created this revision.Nov 11 2020, 6:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2020, 6:54 AM
sammccall requested review of this revision.Nov 11 2020, 6:54 AM
kadircet added inline comments.Nov 11 2020, 7:57 AM
clang-tools-extra/clangd/index/Serialization.cpp
121

regarding minsizes, i suppose the idea was to pass ElementSizeInBytes for containers ? I am OK with the overshooting if you don't want to make this more detailed, but can we drop the param?

211

in theory, this could also trip us over. zlib::uncompress does a reserve using this value. it is harder to come up with an upper bound here, but https://zlib.net/zlib_tech.html#:~:text=Maximum%20Compression%20Factor&text=(The%20test%20case%20was%20a,sources)%20is%201032%3A1. says the theoretical limit is 1032:1.
Maybe enforce that?

Also to make this more similar to others, it looks like zlib::uncompress also has an overload taking a const char *.

520–521

unrelated to this patch, but it feels like we are checking for the error at the wrong place ?

clang-tools-extra/clangd/unittests/SerializationTests.cpp
320

should we rather ADD_FAILURE ?

345

why not ASSERT_TRUE(readIndexFile(..)) ? (or llvm::cantFail)

364

can we use In.Sources.begin()->Digest instead?

sammccall marked 2 inline comments as done.Nov 11 2020, 10:21 AM
sammccall added inline comments.
clang-tools-extra/clangd/index/Serialization.cpp
121

Yeah fair enough - I thought having the param made the logic easier to follow but actually passing it made the code too fragile.

Just inlined badSize() and hardcoded MinSize=1, we can always extract it again later.

520–521

agree. Will fix in a different commit

clang-tools-extra/clangd/unittests/SerializationTests.cpp
320

No, because rlimit can legitimately fail (e.g. we're already using more ram than the limit we tried to set).

If rlimit fails then we may not be able to perform the test meaningfully, so we could skip it in that case (i.e. decide to *pass* immediately).

345

Whoops, this was leftover experimentation code.

364

Hmm, we could, but it has the wrong data type (uint8_t vs char) so we need a reinterpret_cast... not sure it's clearer.

sammccall marked 2 inline comments as done.

Address comments

kadircet accepted this revision.Nov 11 2020, 1:46 PM

thanks, lgtm!

This revision is now accepted and ready to land.Nov 11 2020, 1:46 PM
eugenis added a subscriber: eugenis.EditedNov 12 2020, 2:59 PM

Hi Sam,

this patch is failing on the ubsan bot with:

[ RUN      ] SerializationTest.NoCrashOnBadArraySize
/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:90:26: runtime error: left shift of 127 by 28 places cannot be represented in type 'int'
    #0 0x392dd57 in clang::clangd::(anonymous namespace)::Reader::consumeVar() /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:90:26
    #1 0x392dc68 in bool clang::clangd::(anonymous namespace)::Reader::consumeSize<std::__1::vector<llvm::StringRef, std::__1::allocator<llvm::StringRef> > >(std::__1::vector<llvm::StringRef, std::__1::allocator<llvm::StringRef> >&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:113:17
    #2 0x3926d6a in readIncludeGraphNode /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:275:13
    #3 0x3926d6a in clang::clangd::(anonymous namespace)::readRIFF(llvm::StringRef) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:474:18
    #4 0x3926111 in clang::clangd::readIndexFile(llvm::StringRef) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:676:12
    #5 0x1e5b77c in clang::clangd::(anonymous namespace)::SerializationTest_NoCrashOnBadArraySize_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/unittests/SerializationTests.cpp:380:24
    #6 0x20589f9 in testing::Test::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2474:5
    #7 0x205993b in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
    #8 0x205a4e2 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
    #9 0x20619b2 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
    #10 0x20613c9 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4257:10
    #11 0x2051273 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
    #12 0x2051273 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50:10
    #13 0x7fd11479d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #14 0x1a330d9 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/tools/clang/tools/extra/clangd/unittests/ClangdTests+0x1a330d9)

It looks like the corrupt input in your test case is triggering a preexisting bug in clangd.

oops, this is actually a problem of the test. it is providing an invalid encoding :/ sent out https://reviews.llvm.org/D91405 to fix the issue.

Hi Sam,

this patch is failing on the ubsan bot with:

[ RUN      ] SerializationTest.NoCrashOnBadArraySize
/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:90:26: runtime error: left shift of 127 by 28 places cannot be represented in type 'int'
    #0 0x392dd57 in clang::clangd::(anonymous namespace)::Reader::consumeVar() /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang-tools-extra/clangd/index/Serialization.cpp:90:26

Thanks Evgenii! (It's really surprising to me this is using signed arithmetic!)

Is the sanitizer strictly correct here that this is UB? From my reading only C requires the result of signed shifts to be representable, not C++. https://eel.is/c++draft/expr.shift
So this should be well-defined with the same result as the unsigned arithmetic (assuming 2s complement, I guess).

(We'll fix this to use unsigned arithmetic in any case)