This is an archive of the discontinued LLVM Phabricator instance.

[lldb/DataFormatters] Delete GetStringPrinterEscapingHelper
ClosedPublic

Authored by vsk on Apr 9 2020, 6:02 PM.

Details

Summary

Languages can have different ways of formatting special characters.
E.g. when debugging C++ code a string might look like "\b", but when
debugging Swift code the same string would look like "\u{8}".

To make this work, plugins override GetStringPrinterEscapingHelper.
However, because there's a large amount of subtly divergent work done in
each override, we end up with large amounts of duplicated code. And all
the memory smashers fixed in one copy of the logic (see D73860) don't
get fixed in the others.

IMO the GetStringPrinterEscapingHelper is overly general and hard to
use. I propose deleting it and replacing it with an EscapeStyle enum,
which can be set as needed by each plugin.

A fix for some swift-lldb memory smashers falls out fairly naturally
from this deletion (https://github.com/apple/llvm-project/pull/1046). As
the swift logic becomes really tiny, I propose moving it upstream as
part of this change. I've added unit tests to cover it.

rdar://61419673

Diff Detail

Event Timeline

vsk created this revision.Apr 9 2020, 6:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2020, 6:02 PM
Herald added a subscriber: mgorny. · View Herald Transcript

+ Raphael, I suppose this is in a similar direction as D68691.

aprantl added inline comments.Apr 13 2020, 12:28 PM
lldb/source/DataFormatters/StringPrinter.cpp
126

We have a lot of implementations of this function all over our projects (git grep '\\\\r' llvm/lib) that we might want to factor this into a support function.

vsk marked an inline comment as done.Apr 13 2020, 1:35 PM
vsk added inline comments.
lldb/source/DataFormatters/StringPrinter.cpp
126

I do see another utf decoder implementation in YAMLParser, but it's different enough from lldb's that I'd prefer to hold off on attempting to unify these for some later patch.

This looks great, especially with the extra unit tests. Thanks Vedant.

Have you asked the foundation people about the ASCII support in NSString?

shafik added a subscriber: shafik.Apr 13 2020, 2:23 PM
shafik added inline comments.
lldb/source/DataFormatters/StringPrinter.cpp
50

m_deleter(std::move(rhs.m_deleter))

69

std::move(rhs.m_deleter)

173

constexpr

259

Maybe I am missing something but I don't think the sprintf below can go up to 14 chars.

shafik added inline comments.Apr 13 2020, 3:29 PM
lldb/unittests/DataFormatter/StringPrinterTests.cpp
109

We have raw string literals since C++11:

EXPECT_TRUE(matches("\a", R"("\a")"));

I haven't looked at the code in detail, but it seems ok to me.

What does not seem ok is the hand-rolled matching in the tests. A more idiomatic approach would be to replace isFormatCorrect with a wrapper function which can be called from a test macro, and let gtest do the matching:
Say, something like this:

/// Return the formatted string, or None if formatting failed.
Optional<std::string> format(StringRef input, ...) {
  ...
  if (success)
    return out.GetString();
  return None;
}
...
EXPECT_EQ(format("foo", ...), std::string("foo")); // std::string is needed because Optional<string> and const char * are not comparable -- may want to make an alias/lambda for that
vsk planned changes to this revision.Apr 16 2020, 5:27 PM

Sorry for the delay, I plan to have an update for this soon.

Have you asked the foundation people about the ASCII support in NSString?

No, I'm not sure what to ask. Having thought things over, I think my 'FIXME' about 'In practice, do we ever prefer ASCII-only formatting over UTF8 formatting?' was premature, so I'll delete it.

I did some experiments with forcing invalid data into an NSString and seeing what comes out of NSLog. Both with dataUsingEncoding:NSASCIIStringEncoding, or with just typing in invalid bytes, NSLog just prints an empty string (in the latter case -WCFString-literal even warns that input conversion is stopped after an invalid byte is seen). I don't think it'd be helpful for lldb to match that behavior? At least, we don't match that behavior today, we try to escape and print out the invalid bytes.

If we want to continue to not match NSLog's behavior, then I think we should we treat invalid bytes under ASCII vs. utf differently (we can print the actual character in the utf case, and a sequence of '\x..\x..' otherwise).

lldb/source/DataFormatters/StringPrinter.cpp
259

You're right, I got %x mixed up with %u.

aprantl added inline comments.Apr 17 2020, 9:54 AM
lldb/unittests/DataFormatter/StringPrinterTests.cpp
1

drop the redundant C++ marker

vsk marked 8 inline comments as done.Apr 17 2020, 1:45 PM
vsk added inline comments.
lldb/source/DataFormatters/StringPrinter.cpp
173

Cool, I learned that only 'constexpr' guarantees that you have a compile-time constant. Fixed here and below.

lldb/unittests/DataFormatter/StringPrinterTests.cpp
109

I personally find the current version much clearer, but can change this if others feel strongly.

vsk updated this revision to Diff 258419.Apr 17 2020, 1:48 PM
vsk marked an inline comment as done.

Address review feedback:

  • Use standard gtest operators (EXPECT_EQ)
  • constexpr/std::move cleanup, comment cleanups, sprintf length fix

Also, I took the opportunity to further unify the ASCII and UTF8 handling --
this lets us delete a bunch of duplicated code. PTAL, thanks.

vsk marked an inline comment as done.Apr 17 2020, 1:51 PM
vsk added inline comments.
lldb/source/DataFormatters/StringPrinter.cpp
405

This is what I mean. The whole ReadStringAndDumpToStream<ASCII> specialization can be deleted and replaced with a call to DumpEncodedBufferToStream, as the UTF & ASCII code paths are really similar.

The test stuff looks good to me.

lldb/unittests/DataFormatter/StringPrinterTests.cpp
32

it looks like escape_style could be a regular argument instead of a template.

109

I think the raw strings are cleaner, particularly when it comes to strings like "\\\"" and "\\\\"

vsk updated this revision to Diff 258825.Apr 20 2020, 1:15 PM

Address review feedback:

  • Make 'escape_style' a regular parameter, use raw string literals
vsk added a comment.Apr 28 2020, 10:46 AM

Friendly ping.

shafik added inline comments.Apr 28 2020, 3:12 PM
lldb/source/DataFormatters/StringPrinter.cpp
174

I really wish we could get ride of the naked new. It seems possible.

  • We know the buffer size
  • We know the "expected" escaped_len
  • We could write something like a make_StringPrinterBufferPointer in the same spirit as make_unique

The problem we want to avoid w/ the naked new is that the code becomes more complicated over time and somehow a later change disconnects the new w/ the creation of the StringPrinterBufferPointer which manages the lifetime.

This comment applies to the code later on as well.

vsk updated this revision to Diff 261387.Apr 30 2020, 4:25 PM
  • Avoid heap memory management in "StringPrinterBufferPointer" entirely.
  • Rename "StringPrinterBufferPointer" to "DecodedCharBuffer", as that seems to more accurately reflect what the class is for.
shafik accepted this revision.May 1 2020, 2:19 PM

LGTM with minor comments, this is a big set of changes, I would prefer if one of the other reviewers also gave it a second look.

lldb/source/DataFormatters/StringPrinter.cpp
49

just m_data it will decay to a pointer.

293–296

It feels like the two lambdas here could be one by also capturing elem_type, it would make the lambda larger but is that a concern? Not a big deal just feels like needlessly duplicate code.

648

Super nitpick, can we switch order of the UTF8 and the ASCII specialization so that they appear in the same order as the ReadStringAndDumpToStream above.

This revision is now accepted and ready to land.May 1 2020, 2:19 PM
vsk marked 4 inline comments as done.May 1 2020, 3:07 PM
vsk added inline comments.
lldb/source/DataFormatters/StringPrinter.cpp
293–296

Sure, these can be folded together.

648

No, because the ASCII specialization calls the UTF8 specialization, and the compiler complains that the use occurs before the definition. I'll just reorder the specializations of ReadStringAndDumpToStream so that things match.

vsk updated this revision to Diff 261558.May 1 2020, 3:08 PM
vsk marked an inline comment as done.

Address review feedback

labath accepted this revision.May 4 2020, 2:28 AM

Yeah, I think this looks good too.

lldb/source/DataFormatters/StringPrinter.cpp
42–43

What's up with all the reinterpret_casting? T* is implicitly convertible to a void*...

vsk marked an inline comment as done.May 4 2020, 2:05 PM
vsk added inline comments.
lldb/source/DataFormatters/StringPrinter.cpp
42–43

Ah thanks for catching this, I'll get rid of the redundant casts before committing.

This revision was automatically updated to reflect the committed changes.