Page MenuHomePhabricator

[lldb/DataFormatters] Delete GetStringPrinterEscapingHelper

Authored by vsk on Apr 9 2020, 6:02 PM.



Languages can have different ways of formatting special characters.
E.g. when debugging C++ code a string might look like "\b", but when
debugging Swift code the same string would look like "\u{8}".

To make this work, plugins override GetStringPrinterEscapingHelper.
However, because there's a large amount of subtly divergent work done in
each override, we end up with large amounts of duplicated code. And all
the memory smashers fixed in one copy of the logic (see D73860) don't
get fixed in the others.

IMO the GetStringPrinterEscapingHelper is overly general and hard to
use. I propose deleting it and replacing it with an EscapeStyle enum,
which can be set as needed by each plugin.

A fix for some swift-lldb memory smashers falls out fairly naturally
from this deletion ( As
the swift logic becomes really tiny, I propose moving it upstream as
part of this change. I've added unit tests to cover it.


Diff Detail

Event Timeline

vsk created this revision.Apr 9 2020, 6:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2020, 6:02 PM
Herald added a subscriber: mgorny. · View Herald Transcript

+ Raphael, I suppose this is in a similar direction as D68691.

aprantl added inline comments.Apr 13 2020, 12:28 PM

We have a lot of implementations of this function all over our projects (git grep '\\\\r' llvm/lib) that we might want to factor this into a support function.

vsk marked an inline comment as done.Apr 13 2020, 1:35 PM
vsk added inline comments.

I do see another utf decoder implementation in YAMLParser, but it's different enough from lldb's that I'd prefer to hold off on attempting to unify these for some later patch.

This looks great, especially with the extra unit tests. Thanks Vedant.

Have you asked the foundation people about the ASCII support in NSString?

shafik added a subscriber: shafik.Apr 13 2020, 2:23 PM
shafik added inline comments.







Maybe I am missing something but I don't think the sprintf below can go up to 14 chars.

shafik added inline comments.Apr 13 2020, 3:29 PM

We have raw string literals since C++11:

EXPECT_TRUE(matches("\a", R"("\a")"));

I haven't looked at the code in detail, but it seems ok to me.

What does not seem ok is the hand-rolled matching in the tests. A more idiomatic approach would be to replace isFormatCorrect with a wrapper function which can be called from a test macro, and let gtest do the matching:
Say, something like this:

/// Return the formatted string, or None if formatting failed.
Optional<std::string> format(StringRef input, ...) {
  if (success)
    return out.GetString();
  return None;
EXPECT_EQ(format("foo", ...), std::string("foo")); // std::string is needed because Optional<string> and const char * are not comparable -- may want to make an alias/lambda for that
vsk planned changes to this revision.Apr 16 2020, 5:27 PM

Sorry for the delay, I plan to have an update for this soon.

Have you asked the foundation people about the ASCII support in NSString?

No, I'm not sure what to ask. Having thought things over, I think my 'FIXME' about 'In practice, do we ever prefer ASCII-only formatting over UTF8 formatting?' was premature, so I'll delete it.

I did some experiments with forcing invalid data into an NSString and seeing what comes out of NSLog. Both with dataUsingEncoding:NSASCIIStringEncoding, or with just typing in invalid bytes, NSLog just prints an empty string (in the latter case -WCFString-literal even warns that input conversion is stopped after an invalid byte is seen). I don't think it'd be helpful for lldb to match that behavior? At least, we don't match that behavior today, we try to escape and print out the invalid bytes.

If we want to continue to not match NSLog's behavior, then I think we should we treat invalid bytes under ASCII vs. utf differently (we can print the actual character in the utf case, and a sequence of '\x..\x..' otherwise).


You're right, I got %x mixed up with %u.

aprantl added inline comments.Apr 17 2020, 9:54 AM

drop the redundant C++ marker

vsk marked 8 inline comments as done.Apr 17 2020, 1:45 PM
vsk added inline comments.

Cool, I learned that only 'constexpr' guarantees that you have a compile-time constant. Fixed here and below.


I personally find the current version much clearer, but can change this if others feel strongly.

vsk updated this revision to Diff 258419.Apr 17 2020, 1:48 PM
vsk marked an inline comment as done.

Address review feedback:

  • Use standard gtest operators (EXPECT_EQ)
  • constexpr/std::move cleanup, comment cleanups, sprintf length fix

Also, I took the opportunity to further unify the ASCII and UTF8 handling --
this lets us delete a bunch of duplicated code. PTAL, thanks.

vsk marked an inline comment as done.Apr 17 2020, 1:51 PM
vsk added inline comments.

This is what I mean. The whole ReadStringAndDumpToStream<ASCII> specialization can be deleted and replaced with a call to DumpEncodedBufferToStream, as the UTF & ASCII code paths are really similar.

The test stuff looks good to me.


it looks like escape_style could be a regular argument instead of a template.


I think the raw strings are cleaner, particularly when it comes to strings like "\\\"" and "\\\\"

vsk updated this revision to Diff 258825.Apr 20 2020, 1:15 PM

Address review feedback:

  • Make 'escape_style' a regular parameter, use raw string literals
vsk added a comment.Apr 28 2020, 10:46 AM

Friendly ping.

shafik added inline comments.Apr 28 2020, 3:12 PM

I really wish we could get ride of the naked new. It seems possible.

  • We know the buffer size
  • We know the "expected" escaped_len
  • We could write something like a make_StringPrinterBufferPointer in the same spirit as make_unique

The problem we want to avoid w/ the naked new is that the code becomes more complicated over time and somehow a later change disconnects the new w/ the creation of the StringPrinterBufferPointer which manages the lifetime.

This comment applies to the code later on as well.

vsk updated this revision to Diff 261387.Apr 30 2020, 4:25 PM
  • Avoid heap memory management in "StringPrinterBufferPointer" entirely.
  • Rename "StringPrinterBufferPointer" to "DecodedCharBuffer", as that seems to more accurately reflect what the class is for.
shafik accepted this revision.May 1 2020, 2:19 PM

LGTM with minor comments, this is a big set of changes, I would prefer if one of the other reviewers also gave it a second look.


just m_data it will decay to a pointer.


It feels like the two lambdas here could be one by also capturing elem_type, it would make the lambda larger but is that a concern? Not a big deal just feels like needlessly duplicate code.


Super nitpick, can we switch order of the UTF8 and the ASCII specialization so that they appear in the same order as the ReadStringAndDumpToStream above.

This revision is now accepted and ready to land.May 1 2020, 2:19 PM
vsk marked 4 inline comments as done.May 1 2020, 3:07 PM
vsk added inline comments.

Sure, these can be folded together.


No, because the ASCII specialization calls the UTF8 specialization, and the compiler complains that the use occurs before the definition. I'll just reorder the specializations of ReadStringAndDumpToStream so that things match.

vsk updated this revision to Diff 261558.May 1 2020, 3:08 PM
vsk marked an inline comment as done.

Address review feedback

labath accepted this revision.May 4 2020, 2:28 AM

Yeah, I think this looks good too.


What's up with all the reinterpret_casting? T* is implicitly convertible to a void*...

vsk marked an inline comment as done.May 4 2020, 2:05 PM
vsk added inline comments.

Ah thanks for catching this, I'll get rid of the redundant casts before committing.

This revision was automatically updated to reflect the committed changes.