This is an archive of the discontinued LLVM Phabricator instance.

Data formatters: fix detection of C strings
ClosedPublic

Authored by jarin on Mar 23 2020, 4:39 PM.

Details

Summary

Detection of C strings does not work well for pointers. If the value object holding a (char*) pointer does not have an address (e.g., if it is a temp), the value is not considered a C string and its formatting is left to DumpDataExtractor rather than the special handling in ValueObject::DumpPrintableRepresentation. This leads to inconsistent outputs, e.g., in escaping non-ASCII characters. See the test for an example; the second test expectation is not met (without this patch). With this patch, the C string detection only insists that the pointer value is valid. The patch makes the code consistent with how the pointer is obtained in ValueObject::ReadPointedString.

Diff Detail

Event Timeline

jarin created this revision.Mar 23 2020, 4:39 PM
teemperor accepted this revision.Mar 24 2020, 1:54 AM

LGTM, thanks for tracking this down! I only have a minor comment about the test character.

lldb/test/API/functionalities/data-formatter/cstring-utf8-summary/TestCstringUnicode.py
19

expect_expr is the modern way to check this:

self.expect_expr("s", result_summary='"πŸ”₯"')
self.expect_expr("(const char*)s", result_summary='"πŸ”₯"')
lldb/test/API/functionalities/data-formatter/cstring-utf8-summary/main.cpp
3

Could you make this a string like u8"πŸ”₯"? "Γ©" is also a valid character in a lot of extended-ASCII character sets (e.g., Latin-1) and we probably should take some unicode-exclusive character and make sure it's utf-8.

This revision is now accepted and ready to land.Mar 24 2020, 1:54 AM
jarin updated this revision to Diff 252276.Mar 24 2020, 4:55 AM
jarin marked 2 inline comments as done.

Addressed reviewer comments.

jarin added a comment.Mar 24 2020, 4:56 AM

Thanks for the review! Could you possibly land this for me?

This revision was automatically updated to reflect the committed changes.