This patch fixes a few related out-of-bounds read bugs in the
string data formatters. These issues have to do with mishandling of un-
initialized strings. These manifest as ASan exceptions when debugging a
clang binary.
The first issue was that the std::string formatter treated strings in
"short mode" with length greater than the size of the inline buffer as
valid.
The second issue was that the StringPrinter facility did not check that
a full utf8 codepoint sequence can be read from the buffer (i.e. there
are some missing range checks). I took the opportunity here to delete
some untested code that was meant to deal with invalid input and replace
it with fail-on-invalid logic ([1][2][3]). This means we'll give up on
formatting an invalid string instead of guessing our way through it.
The third issue is that StringPrinter did not check that a utf8 sequence
could actually be fully read from the string payload. This one is especially
tricky as we may overflow the buffer pointer while reading the sequence.
I also noticed that the std::string formatter would spew the raw version of
the underlying ValueObject when garbage is detected. I've changed this to
just print "Summary Unavailable" instead, as we do elsewhere.
I've added regression tests for these issues to
test/functionalities/data-formatter/data-formatter-stl/libcxx/string.
[1]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/lldb/source/DataFormatters/StringPrinter.cpp.html#L136
[2]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/lldb/source/DataFormatters/StringPrinter.cpp.html#L163
[3]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/lldb/source/DataFormatters/StringPrinter.cpp.html#L357
rdar://59080026
This is much nicer then the previous raw arrays and self-documents with the fields, nice!