Page MenuHomePhabricator

LLDB-MI: Bug when evaluating strings containing characters from non-ascii range
ClosedPublic

Authored by evgeny777 on Sep 22 2015, 9:58 AM.

Details

Summary

In case string contains characters out of ASCII range, lldb-mi prints them as hexadecimal codes
The correct behaviour should be converting to UTF-8 instead. This patch fixes this making lldb-mi use
registered type summary providers, when they are available

Another problem is incorrect evaluation of some composite types, like std::string, which is also fixed in a way described above (using type summary provider)

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 updated this revision to Diff 35379.Sep 22 2015, 9:58 AM
evgeny777 retitled this revision from to LLDB-MI: Bug when evaluating strings containing characters from non-ascii range.
evgeny777 updated this object.
evgeny777 added reviewers: ki.stfu, abidh.
evgeny777 added subscribers: dawn, KLapshin.
ki.stfu edited edge metadata.
ki.stfu added a subscriber: lldb-commits.

Please always add lldb-commits to subscribers

clayborg requested changes to this revision.Sep 22 2015, 11:07 AM
clayborg edited edge metadata.

You can't use std::string in the public API. Use lldb::SBStream as noted in inlined comments.

include/lldb/API/SBValue.h
132–133 ↗(On Diff #35379)

You can't use std::string in the public API. You should use a SBStream instead. It can be used here:

bool
GetValueSummary (SBStream &strm);

Then you can get the data and length:

const char *data = strm.GetData();
const size_t data_len = strm.GetSize();
source/API/SBValue.cpp
711 ↗(On Diff #35379)

Use SBStream

This revision now requires changes to proceed.Sep 22 2015, 11:07 AM
granata.enrico requested changes to this revision.Sep 22 2015, 11:16 AM
granata.enrico added a reviewer: granata.enrico.
granata.enrico added a subscriber: granata.enrico.

Is there a reason to explicitly add an API to get the concatenation of value and summary?

We already have APIs to retrieve the pieces:

const char *
GetValue ();

const char *
GetSummary ();

We even already have an API to retrieve the summary in an SBStream:

const char *
GetSummary (lldb::SBStream& stream,
            lldb::SBTypeSummaryOptions& options);

Is there a reason why the MI can't just call these APIs and then process the results as it sees fit?

As Enrico stated. there is already a SBStream based way to get the summary in "const char * SBValueGetSummary (lldb::SBStream& stream, lldb::SBTypeSummaryOptions& options);" so no need to add anything as I had suggested.

Yes I know about GetValue() and stuff
Actually I implemented new method, because I needed TypeSummaryImpl::DoesPrintValue().
In lldb when you evaluate char*, you get both value and summary, for example:

0xdeadbeef "hello"

But when you evaluate char[] array, you get just "hello". Value is not printed, because DoesPrintValue() method returns false
The same is for UTF-16 characters (char16_t). This type has both value and summary, but only summary is printed by lldb

I wanted to replicate the same behavior in lldb-mi. But if adding methods to SBValue should be avoided, I'll think of some workaround inside lldb-mi.
Your opinions are welcome, of course

Gotcha!

My suggestion would be to add the DoesPrintValue() logic to SBTypeSummary. I can't recall if it's there already - but if not, it would be a fine thing to add. Then the MI could use SBValue and SBTypeSummary to make that determination as it sees fit.

ki.stfu requested changes to this revision.Sep 23 2015, 12:48 AM
ki.stfu edited edge metadata.

Please, next time make a patch with full context:

svn diff --diff-cmd diff -x "-U9999" > mypatch.txt

This will help us to reduce the review time.

test/tools/lldb-mi/variable/TestMiGdbSetShowPrint.py
14 ↗(On Diff #35379)

Probably it should be named like eval_and_check_array()

16–19 ↗(On Diff #35379)

Please use '%' operator to format the string.

21 ↗(On Diff #35379)

This comment deceives us because you use eval_ap on the line #56 at the time when char-array-as-string is "off".

22 ↗(On Diff #35379)

What is ap stands for?

26 ↗(On Diff #35379)

This looks like a magic. \t is a part of the variable's value, so please pass it via arguments too.

BTW: Use add_slashes() from MiLibraryLoadedTestCase.test_lldbmi_library_loaded test case to reduce amount of '\' characters.

62 ↗(On Diff #35379)

'u' belongs to "hello". Please pass them together.

85 ↗(On Diff #35379)

Why the size is a regexp (it was 10)?

100 ↗(On Diff #35379)

Please don't use your company name here.

109 ↗(On Diff #35379)

Where is it from? It must be removed before merging.

121–143 ↗(On Diff #35379)

This test case doesn't relate to gdb-set/gdb-show commands, so it should be in test/tools/lldb-mi/variable/TestMiVar.py

125 ↗(On Diff #35379)

Rename to something more specific (test_lldbmi_var_create_for_stl_types, for ex)

143 ↗(On Diff #35379)

Looks like "std::__1::string" is a platform-dependent name. Don't use it.

test/tools/lldb-mi/variable/main.cpp
13 ↗(On Diff #35379)

Don't use entire namespaces where they aren't needed. Just use std::string on the line #83

70–73 ↗(On Diff #35379)

Please don't use your company name here.

83 ↗(On Diff #35379)

just use std::string here, and rename please:

std::string std_string = "\t\"hello\"\n";
tools/lldb-mi/MICmnLLDBUtilSBValue.cpp
20–24 ↗(On Diff #35379)

What is it?

evgeny777 updated this revision to Diff 35607.Sep 24 2015, 4:54 AM
evgeny777 edited edge metadata.

Revised according to suggestions from granta.enrico and ki.stfu

evgeny777 updated this revision to Diff 35618.Sep 24 2015, 6:29 AM
evgeny777 edited edge metadata.

Revised, removed dependencies from D13094

ki.stfu requested changes to this revision.Sep 25 2015, 12:40 AM
ki.stfu edited edge metadata.
ki.stfu added inline comments.
include/lldb/API/SBTypeSummary.h
125–126 ↗(On Diff #35618)

You can use clang-format to follow the LLDB coding style, or just do the following:

bool
DoesPrintValue (const SBValue &value);
source/API/SBTypeSummary.cpp
290 ↗(On Diff #35618)

ditto

test/tools/lldb-mi/variable/TestMiVar.py
355 ↗(On Diff #35618)

Use lazy regex:

self.expect('\^done,name="var\d+",numchild="[0-9]+",value="\\\\"hello\\\\"",type="std::[\S]*?string",thread-id="1",has_more="0"')
tools/lldb-mi/MICmnLLDBUtilSBValue.cpp
173–174 ↗(On Diff #35618)
vwrValue = GetValueSummary();
if (!vwrValue.empty)
  return MIstatus::success;
191–193 ↗(On Diff #35618)
const CMIUtilString summary = GetValueSummary();
if (!summary.empty())
    return summary;
242–244 ↗(On Diff #35618)
const CMIUtilString summary = GetValueSummary();
if (!summary.empty())
    return summary;
285–287 ↗(On Diff #35618)
const CMIUtilString summary = GetValueSummary();
if (!summary.empty())
    return summary;
tools/lldb-mi/MICmnLLDBUtilSBValue.h
58 ↗(On Diff #35618)

It is better:

CMIUtilString GetValueSummary(CMIUtilString &vrValue) const
This revision now requires changes to proceed.Sep 25 2015, 12:40 AM
evgeny777 added inline comments.Sep 25 2015, 12:49 AM
tools/lldb-mi/MICmnLLDBUtilSBValue.h
58 ↗(On Diff #35618)

Really?

Did you mean
const CMIUtilString& GetValueSummary(void) const ?

evgeny777 added inline comments.Sep 25 2015, 12:50 AM
tools/lldb-mi/MICmnLLDBUtilSBValue.cpp
191–193 ↗(On Diff #35618)

const CMIUtilString& summary = GetValueSummary(); ???

ki.stfu added inline comments.Sep 25 2015, 1:38 AM
tools/lldb-mi/MICmnLLDBUtilSBValue.cpp
191–193 ↗(On Diff #35618)

Yes, it's ok.

tools/lldb-mi/MICmnLLDBUtilSBValue.h
58 ↗(On Diff #35618)

Sorry, I mistyped.

CMIUtilString GetValueSummary() const
evgeny777 added inline comments.Sep 25 2015, 7:58 AM
source/API/SBTypeSummary.cpp
290 ↗(On Diff #35618)

I used clang-format with style file taken from lldb directory here, but formatting didn't change. So I'm curious what's the problem is

evgeny777 updated this revision to Diff 35725.Sep 25 2015, 8:01 AM
evgeny777 edited edge metadata.

Revised with changes requested by ki.stfu

ki.stfu accepted this revision.Sep 25 2015, 11:25 PM
ki.stfu edited edge metadata.

lgtm

But you still should get lgtm from @clayborg and @granata.enrico before landing.

NB: Greg is out for the next nine days, Enrico for the next five days.

granata.enrico requested changes to this revision.Oct 1 2015, 2:40 PM
granata.enrico edited edge metadata.

Sorry, I didn't realize you were still waiting on me to land this patch.

I have a few extra comments before this is ready to land.

include/lldb/API/SBTypeSummary.h
125 ↗(On Diff #35725)

Can you please change this to

bool
DoesPrintValue (lldb::SBValue value);

There is no need to pass the SBValue by reference since you're not actually altering it - and since SBValue wraps a (glorified) pointer to a ValueObject you're not really copying much data over. I don't see the need to classify the function as const (only IsValid() seems to be marked thusly in that file), and definitely no need for the SBValue to be a const

source/API/SBTypeSummary.cpp
289 ↗(On Diff #35725)

Of course change this in the C++ file as well

tools/lldb-mi/MICmnLLDBUtilSBValue.cpp
372 ↗(On Diff #35725)

I might be missing something, but how is this going to work when an SBValue has a value but no summary (something as simple as int x = 1;)?

GetSummary() is free to return NULL, and I only see you fetching the value here when there is a summary. Your logic should be more like

GetSummary()
if (!summary || summary && DoesPrintValue())
append value

A very long-term goal of mine is for all of this logic to exist in the ValueObjectPrinter and clients to simply be able to ask it to print a ValueObject according to some specification of theirs, in the middle of an existing stream. But that is way beyond the scope of your patch, so if you just clean up the logic here to do the right thing for values without summaries, I am happy.

This revision now requires changes to proceed.Oct 1 2015, 2:40 PM
evgeny777 added inline comments.Oct 2 2015, 4:08 AM
tools/lldb-mi/MICmnLLDBUtilSBValue.cpp
372 ↗(On Diff #35725)

Short answer:
Yes types without summary work, please look at CMICmnLLDBUtilSBValue::GetSimpleValue() function

Longer anser:
This patch only addresses specific issues

  • Correct UTF-16/32 handling for characters and strings
  • Correct handling of composite types, like std::string

The patch logic is: if summary is available use it (optionally combined with value). If it is not then fall back to MI default formatting (see ConvertToPrintableASCII for example).
For other simple types, like int and float GetValueSummary() is never called (MI calls GetValue() instead)

Probably refactoring MI will provide better code, but this looks like to be out of this patch scope

evgeny777 added inline comments.Oct 2 2015, 4:48 AM
include/lldb/API/SBTypeSummary.h
125 ↗(On Diff #35725)

Ok.
Just curious - why const is bad?

evgeny777 updated this revision to Diff 36341.Oct 2 2015, 4:49 AM
evgeny777 edited edge metadata.

Changed DoesPrintValue() function prototype

granata.enrico accepted this revision.Oct 2 2015, 10:32 AM
granata.enrico edited edge metadata.

I am not deeply involved in MI internals, so I am going to assume you ran the test suite, and that you actually verified that types without summaries still work

In general, once you opt into data formatters, you get a lot more than simply UTF handling. I understand that this is your rationale for the change, but you may want to make sure that MI still does the right thing if users define formatters for types that have nothing to do with strings (e.g. what if somebody does define a summary for int or float?)

With that said, this looks reasonable, and an MI refactoring would be outside of my realm anyway.

dawn added a comment.Oct 2 2015, 1:08 PM

You can use clang-format to follow the LLDB coding style, or just do the following:

So we've been telling folks to use clang-format, but it's not working correctly (my version is clang-format version 3.6.0 (217927)). It's turns:

bool
DoesPrintValue (lldb::SBValue value);

into:

bool DoesPrintValue (lldb::SBValue value);

We have AlwaysBreakAfterDefinitionReturnType set to true, but that only works on function definitions. I see there is a patch for AlwaysBreakAfterDeclarationReturnType at http://reviews.llvm.org/D10370?id=27471, but it's not been accepted yet.

So for now, we can't rely on clang-format for function declarations :(

dawn added a comment.Oct 2 2015, 1:12 PM

This patch is good to commit now right? It's not marked "accepted" for some reason.

I Thought I had marked it as good to go.
My bad if I have not.

  • Enrico

Thanks for a good point.
MI formatting refactoring will likely come soon as a separate review

ki.stfu added a comment.EditedOct 2 2015, 1:25 PM
In D13058#258903, @dawn wrote:

This patch is good to commit now right? It's not marked "accepted" for some reason.

It's not "accepted" because @clayborg rejected this CL and hasn't changed his decision. But I think he will not mind if you go ahead.

clayborg accepted this revision.Oct 5 2015, 1:08 PM
clayborg edited edge metadata.

Back from vacation, sorry for the delay. Looks good if Enrico is happy.

This revision is now accepted and ready to land.Oct 5 2015, 1:08 PM
This revision was automatically updated to reflect the committed changes.