This is an archive of the discontinued LLVM Phabricator instance.

[lldb/StringPrinter] Avoid reading garbage in uninitialized strings
ClosedPublic

Authored by vsk on Feb 2 2020, 9:04 PM.

Details

Summary

This patch fixes a few related out-of-bounds read bugs in the
string data formatters. These issues have to do with mishandling of un-
initialized strings. These manifest as ASan exceptions when debugging a
clang binary.

The first issue was that the std::string formatter treated strings in
"short mode" with length greater than the size of the inline buffer as
valid.

The second issue was that the StringPrinter facility did not check that
a full utf8 codepoint sequence can be read from the buffer (i.e. there
are some missing range checks). I took the opportunity here to delete
some untested code that was meant to deal with invalid input and replace
it with fail-on-invalid logic ([1][2][3]). This means we'll give up on
formatting an invalid string instead of guessing our way through it.

The third issue is that StringPrinter did not check that a utf8 sequence
could actually be fully read from the string payload. This one is especially
tricky as we may overflow the buffer pointer while reading the sequence.

I also noticed that the std::string formatter would spew the raw version of
the underlying ValueObject when garbage is detected. I've changed this to
just print "Summary Unavailable" instead, as we do elsewhere.

I've added regression tests for these issues to
test/functionalities/data-formatter/data-formatter-stl/libcxx/string.

[1]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/lldb/source/DataFormatters/StringPrinter.cpp.html#L136
[2]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/lldb/source/DataFormatters/StringPrinter.cpp.html#L163
[3]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/lldb/source/DataFormatters/StringPrinter.cpp.html#L357

rdar://59080026

Diff Detail

Event Timeline

vsk created this revision.Feb 2 2020, 9:04 PM
shafik added inline comments.Feb 3 2020, 3:30 PM
lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
29 ↗(On Diff #241959)

While I get what you are doing here, we know he structure of libc++ SSO implementation and we are manually building a corrupt one, this is fragile to changes in the implementation.

I don't have an immediate suggestion for an alternative approach but if we stick with this we should stick a big comment explaining this, perhaps laying out the assumptions of the internal layout we are assuming and maybe some sanity checks maybe using offsetof to verify fields exist and are where we expect them to be.

lldb/source/DataFormatters/StringPrinter.cpp
64

can we use reinterpret_cast as opposed to what is basically a C-style cast. This also has the advantage of pointing out potentially dangerous code for future persons refactoring this code.

shafik added a comment.Feb 3 2020, 3:32 PM

Is this missing additions to TestDataFormatterLibcxxString.py?

vsk added inline comments.Feb 3 2020, 4:20 PM
lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
29 ↗(On Diff #241959)

I don't see how this is fragile. The structure of libc++'s SSO implementation is ABI, and is unlikely to change (esp. not in a way that turns either one of the garbage strings into a valid string). I've left comments explaining what's wrong with both of the garbage strings, but can leave a pointer to https://joellaity.com/2020/01/31/string.html for more info?

lldb/source/DataFormatters/StringPrinter.cpp
64

As written, I think the casts to uintptr_t convey the assumptions being made here.

vsk planned changes to this revision.Feb 3 2020, 4:36 PM

I'll update this tomorrow with checks in TestDataFormatterLibcxxString.py for how lldb formats garbage{1,2}.

shafik added inline comments.Feb 3 2020, 7:05 PM
lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
29 ↗(On Diff #241959)

Sure, that note would be fine.

lldb/source/DataFormatters/StringPrinter.cpp
64

Perhaps but I can not easily search for C-style casts but I can easily audit for reinterpret_cast and reinterpret_cast sometimes has stronger semantics than C-style casts e.g. this example

So while this does not apply to this case it is best to get used to using reinterpret_cast instead saving some character by using a C-style cast.

teemperor requested changes to this revision.Feb 4 2020, 1:29 AM
teemperor added inline comments.
lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
8 ↗(On Diff #241959)

I think those byte arrays need a quick comment about which elements mean what (or how they trigger the respective code paths). Just pointing out which bytes are supposed to overwrite which std::string members is good enough. Something like a macro maybe? #define STD_STRING_BYTES(cap, size, length) {cap, size, length}

29 ↗(On Diff #241959)

Can you instead do a #if _LIBCPP_ABI_VERSION == 1 and have the #else as an #error that this test needs updating. We don't support any other libc++ ABI beside 1 in LLDB but if we ever do then this should not silently pass.

lldb/source/DataFormatters/StringPrinter.cpp
67

Isn't this just assert(buffer<buffer_end)? That's less confusing IMHO (and I think in general this check can be in GetPrintable as this should always be true for all GetPrintableImpl).

132

Same as above.

139

Isnt' !isInHalfOpenRange(buffer + (utf8_encoded_len - 1), buffer, buffer_end)) just buffer + (utf8_encoded_len - 1U) < buffer_end? utf8_encoded_len is always positive so the check if it adding it to buffer makes it smaller than buffer can only happen with an integer overflow IIUC (which we probably should check against more explicitly then).

vsk updated this revision to Diff 242417.Feb 4 2020, 1:26 PM
vsk marked 5 inline comments as done.

Address review feedback.

vsk added inline comments.Feb 4 2020, 1:29 PM
lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
29 ↗(On Diff #241959)

Sure, but the size check is not primarily about the ABI. The garbage examples presuppose 64-bit pointer & size types, which is not true on some watches.

lldb/source/DataFormatters/StringPrinter.cpp
139

I've lifted the buffer < buffer_end check into GetPrintable, and made the overflow check here explicit.

vsk updated this revision to Diff 242461.Feb 4 2020, 4:13 PM
vsk added a reviewer: Restricted Project.
  • Enhance test coverage with several more examples of garbage long-mode std::strings.
  • In several cases, when we detected an invalid string, we would fall back to spewing the "raw mode" representation of the ValueObject to the user. Use a ScopeExit to make sure we never do that.
  • Detect when the GetPointeeData call for the long-mode payload fails, and do not attempt to read the full payload when this happens.
vsk edited the summary of this revision. (Show Details)Feb 4 2020, 4:22 PM
vsk updated this revision to Diff 242479.Feb 4 2020, 5:30 PM

Per offline feedback from Jonas, label variables const where applicable and get rid of the ScopeExits.

JDevlieghere added inline comments.
lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
474

Definitely something for another patch, but this is really bad, looking at the signature I have no idea what's in or output. This should return an optional<struct/pair/tuple>.

612

This is so much better.

657–659

Can we move the extractor closer to where it's used, so basically after the capping check?

664

Not code you touched, but...

options.SetPrefixToken(prefix_token.empty() ? nullptr : prefix_token);

might prevent this visual break in setting the options.

680

const bool

vsk updated this revision to Diff 242495.Feb 4 2020, 6:40 PM
vsk marked an inline comment as done.
JDevlieghere accepted this revision as: JDevlieghere.Feb 4 2020, 6:54 PM

LGTM, but Raphael should have a chance to take another look.

shafik added inline comments.Feb 4 2020, 10:58 PM
lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
23 ↗(On Diff #242495)

This is much nicer then the previous raw arrays and self-documents with the fields, nice!

lldb/source/DataFormatters/StringPrinter.cpp
143

Wouldn't we want checkedAddUnsigned? This would also mean casting to uintptr_t.

469

Not your code but these auto seem unnecessary. Especially printable is that just a bool? Same comment in similar code above.

lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
530

So short_mode means SSO?

555

What does cap represent? It is not obvious in this context.

vsk updated this revision to Diff 242661.Feb 5 2020, 9:34 AM
vsk marked 4 inline comments as done.
  • s/cap/capacity/
  • Remove the checkedAdd pointer overflow check as it's not necessary.
lldb/source/DataFormatters/StringPrinter.cpp
143

Not sure about that, as a change in the MSB seems significant to me.

Stepping back a bit though, it seems like this overflow check isn't useful. The data formatter must have called GetPointeeData to transfer the string payload into a host buffer, and we validate the number of bytes read against the size of the payload. So, if pointer overflow occurs, the bytes_read validation would catch it.

I'll just delete this.

469

I'll leave this as a follow-up. printable is a StringPrinter::StringPrinterBufferPointer.

lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
530

Yes.

555

"capacity". I'm using the same jargon as the comment above LibcxxStringLayoutMode does, but it doesn't hurt to write this out.

vsk added a comment.Feb 12 2020, 10:45 AM

Ping, @teemperor do you have any concerns about this one?

teemperor accepted this revision.Feb 12 2020, 10:53 AM
In D73860#1872746, @vsk wrote:

Ping, @teemperor do you have any concerns about this one?

Didn't see Jonas comment when he accepted, sorry. LGTM.

lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
29 ↗(On Diff #241959)

Actually the #if *and* a static_assert comparing the size would be best IMHO.

This revision is now accepted and ready to land.Feb 12 2020, 10:53 AM
vsk marked an inline comment as done.Feb 12 2020, 11:20 AM

Thanks everyone for the reviews!

lldb/packages/Python/lldbsuite/test/functionalities/data-formatter/data-formatter-stl/libcxx/string/main.cpp
29 ↗(On Diff #241959)

I'm not sure how to write a static assert that isn't a little brittle. Maybe static_assert(sizeof(void *) != 8 || sizeof(std::string) == 24, "unknown std::string layout")?

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 11:29 AM