This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
include/lldb/
-
lldb/
-
lldb-enumerations.h
-
lit/SymbolFile/NativePDB/
-
SymbolFile/
-
NativePDB/
-
globals-fundamental.cpp
-
source/
-
Commands/
-
CommandObjectMemory.cpp
-
Core/
-
DumpDataExtractor.cpp
-
ValueObject.cpp
-
DataFormatters/
-
FormatManager.cpp
-
VectorType.cpp
-
Symbol/
-
ClangASTContext.cpp

Differential D53989

Fix formatting of wchar, char16, and char32
Needs ReviewPublic

Authored by zturner on Nov 1 2018, 11:50 AM.

Download Raw Diff

Details

Reviewers

jingham

Summary

char16, char32, and wchar_t were previously broken. If you had a simple variable like wchar_t x = L'1' and wrote p x LLDB would output (wchar_t) x = 1\0. This is because it was using eFormatChar with a size of 2. What we needed was to introduce a special format specifically for wchar_t. The only valid sizes of wchar_t on all existing compilers are 2 and 4, so there's no real point trying to handle sizes of 1 and 8.

Along the way, I accidentally stumbled across the reason that references that char16 and char32 types were also formatting incorrectly, so I fixed that as well.

Diff Detail

Event Timeline

zturner created this revision.Nov 1 2018, 11:50 AM

There were a bunch of other tests testing wchar_t handling, all in:

lang/cpp/wchar_t

as well as some tests in:

functionalities/data-formatter/data-formatter-stl/libcxx/string/TestDataFormatterLibcxxString.py

Those tests weren't failed (except for the latter test, and that only for android/gmodules. Do you know why those tests were passing when the functionality was broken? Maybe they need fixing to be more accurate?

No, but thanks for the pointer. Interestingly, it worked for me on my home machine but not on my work machine, and I'm not sure what the difference between the two is.

Clearly the test in lang/cpp/wchar_t is making it into lldb_private::formatters::WCharSummaryProvider whereas locally I am not.

Update: It's because There was a problem with my PYTHONPATH and python was getting disabled at CMake configure time. So I was effectively running with LLDB_DISABLE_PYTHON. So basically it's only broke on the LLDB_DISABLE_PYTHON codepath.

So the deal is that we were relying on a summary formatter to print wchar_t before, and now you have a format option that handles them as well? Do we need both? Maybe the summary also handles wchar_t * strings?

As an aside, for reasons that are not entirely clear to me most of the data formatter code is #ifndef LLDB_DISABLE_PYTHON. That shouldn't be true, C++ based formatters (which all the built-in formatters are) should be able to work in the absence of Python. Figuring out why this is true is on my list of things to investigate some spare afternoon...

Side question, should we just kill the LLDB_DISABLE_PYTHON=0 codepath? It can be a headache to get LLDB linking with Python (particularly on Windows), but it would be nice to be able to delete all the preprocessor stuff.

It doesn't seem unreasonable to want to build lldb for smaller systems that don't have Python available, and in fact we do that internally at Apple. Actually, it DOES seem a little unreasonable to me because after all you can just run the debugserver/lldb-server and connect remotely. But I have not to date been able to convince the folks who have to work on said systems that they don't really want an on device lldb. Until I can win that argument - which I project happening only just shy of never - I'd rather not lose the ability to build lldb without Python.

Note also, the vast majority of the uses of LLDB_DISABLE_PYTHON are related to the requirement that we have Python to use any of the data formatter facilities. Those uses shouldn't be necessary. All the scripted interpreters should go through the generic ScriptInterpreter interface. There's a ScriptInterpreterNone that should stand in for the Python interpreter in every use except directly managing the Python interpreter. So there's no structural reason why we should need LLDB_DISABLE_PYTHON for anything but the initializers. We should just be able to not build the *Python.cpp files and use the define only to not initialize the Python script interpreter. Something got balled up in how the data formatters were implemented that this isn't true, IMHO.

If somebody wants to spend time looking at this, figuring how to untangle this would serve your purposes and also get the architecture back right at the same time.

lldb-server is used on architectures that don't have python (readily) available, and it uses the same codebase as the rest of lldb. (Granted, it only needs a small subset of that codebase, and this subset doesn't/shouldn't care about python, but we aren't able to split out that part cleanly (yet)). So I don't think requiring python is a good idea.

What we could do is make it a build-time error if we failed to detect&configure python *AND* the user hasn't explicitly disabled it. This would catch this problem early and avoid other surprises later on (e.g. the entire dotest test suite only works with python enabled. without it, you'd likely get some weird error).

Revision Contents

Path

Size

lldb/

include/

lldb/

lldb-enumerations.h

1 line

lit/

SymbolFile/

NativePDB/

globals-fundamental.cpp

16 lines

source/

Commands/

CommandObjectMemory.cpp

10 lines

Core/

DumpDataExtractor.cpp

17 lines

ValueObject.cpp

1 line

DataFormatters/

FormatManager.cpp

1 line

VectorType.cpp

5 lines

Symbol/

ClangASTContext.cpp

5 lines

Diff 172193

lldb/include/lldb/lldb-enumerations.h

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	enum Format {			enum Format {
	eFormatDefault = 0,			eFormatDefault = 0,
	eFormatInvalid = 0,			eFormatInvalid = 0,
	eFormatBoolean,			eFormatBoolean,
	eFormatBinary,			eFormatBinary,
	eFormatBytes,			eFormatBytes,
	eFormatBytesWithASCII,			eFormatBytesWithASCII,
	eFormatChar,			eFormatChar,
				eFormatWchar,
	eFormatCharPrintable, // Only printable characters, space if not printable			eFormatCharPrintable, // Only printable characters, space if not printable
	eFormatComplex, // Floating point complex type			eFormatComplex, // Floating point complex type
	eFormatComplexFloat = eFormatComplex,			eFormatComplexFloat = eFormatComplex,
	eFormatCString, // NULL terminated C strings			eFormatCString, // NULL terminated C strings
	eFormatDecimal,			eFormatDecimal,
	eFormatEnum,			eFormatEnum,
	eFormatHex,			eFormatHex,
	eFormatHexUppercase,			eFormatHexUppercase,
	▲ Show 20 Lines • Show All 952 Lines • Show Last 20 Lines

lldb/lit/SymbolFile/NativePDB/globals-fundamental.cpp

	Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: (lldb) target variable CRF			// CHECK-NEXT: (lldb) target variable CRF
	// CHECK-NEXT: (const float &) CRF = {{.*}} (&::CRF = 3.1415)			// CHECK-NEXT: (const float &) CRF = {{.*}} (&::CRF = 3.1415)
	const double &CRD = D;			const double &CRD = D;
	// CHECK-NEXT: (lldb) target variable CRD			// CHECK-NEXT: (lldb) target variable CRD
	// CHECK-NEXT: (const double &) CRD = {{.*}} (&::CRD = 3.1415000000000002)			// CHECK-NEXT: (const double &) CRD = {{.*}} (&::CRD = 3.1415000000000002)

	char16_t &RC16_24 = C16_24;			char16_t &RC16_24 = C16_24;
	// CHECK: (lldb) target variable RC16_24			// CHECK: (lldb) target variable RC16_24
	// FIXME: (char16_t &) RC16_24 = {{.*}} (&::RC16_24 = U+0014)			// CHECK: (char16_t &) RC16_24 = {{.*}} (&::RC16_24 = U+0014)
	char32_t &RC32_42 = C32_42;			char32_t &RC32_42 = C32_42;
	// CHECK: (lldb) target variable RC32_42			// CHECK: (lldb) target variable RC32_42
	// FIXME: (char32_t &) RC32_42 = {{.*}} (&::RC32_42 = U+0x00000022)			// CHECK: (char32_t &) RC32_42 = {{.*}} (&::RC32_42 = U+0x00000022)
	wchar_t &RWC1 = WC1;			wchar_t &RWC1 = WC1;
	// CHECK: (lldb) target variable RWC1			// CHECK: (lldb) target variable RWC1
	// FIXME: (wchar_t &) RWC1 = {{.*}} (&::RWC1 = L'1')			// CHECK: (wchar_t &) RWC1 = {{.*}} (&::RWC1 = L'1')
	wchar_t &RWCP = WCP;			wchar_t &RWCP = WCP;
	// CHECK: (lldb) target variable RWCP			// CHECK: (lldb) target variable RWCP
	// FIXME: (wchar_t &) RWCP = {{.*}} (&::RWCP = L'P')			// CHECK: (wchar_t &) RWCP = {{.*}} (&::RWCP = L'P')
	const char16_t &CRC16_24 = C16_24;			const char16_t &CRC16_24 = C16_24;
	// CHECK: (lldb) target variable CRC16_24			// CHECK: (lldb) target variable CRC16_24
	// FIXME: (const char16_t &) CRC16_24 = {{.*}} (&::CRC16_24 = U+0014)			// CHECK: (const char16_t &) CRC16_24 = {{.*}} (&::CRC16_24 = U+0014)
	const char32_t &CRC32_42 = C32_42;			const char32_t &CRC32_42 = C32_42;
	// CHECK: (lldb) target variable CRC32_42			// CHECK: (lldb) target variable CRC32_42
	// FIXME: (const char32_t &) CRC32_42 = {{.*}} (&::CRC32_42 = U+0x00000022)			// CHECK: (const char32_t &) CRC32_42 = {{.*}} (&::CRC32_42 = U+0x00000022)
	const wchar_t &CRWC1 = WC1;			const wchar_t &CRWC1 = WC1;
	// CHECK: (lldb) target variable CRWC1			// CHECK: (lldb) target variable CRWC1
	// FIXME: (const wchar_t &) CRWC1 = {{.*}} (&::CRWC1 = L'1')			// CHECK: (const wchar_t &) CRWC1 = {{.*}} (&::CRWC1 = L'1')
	const wchar_t &CRWCP = WCP;			const wchar_t &CRWCP = WCP;
	// CHECK: (lldb) target variable CRWCP			// CHECK: (lldb) target variable CRWCP
	// FIXME: (const wchar_t &) CRWCP = {{.*}} (&::CRWCP = L'P')			// CHECK: (const wchar_t &) CRWCP = {{.*}} (&::CRWCP = L'P')


	// CHECK: (lldb) quit			// CHECK: (lldb) quit

	int main(int argc, char **argv) {			int main(int argc, char **argv) {
	return CIMax;			return CIMax;
	}			}
	No newline at end of file			No newline at end of file

lldb/source/Commands/CommandObjectMemory.cpp

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	Status FinalizeSettings(Target *target, OptionGroupFormat &format_options) {
case eFormatPointer:		case eFormatPointer:
byte_size_value = target->GetArchitecture().GetAddressByteSize();		byte_size_value = target->GetArchitecture().GetAddressByteSize();
if (!num_per_line_option_set)		if (!num_per_line_option_set)
m_num_per_line = 4;		m_num_per_line = 4;
if (!count_option_set)		if (!count_option_set)
format_options.GetCountValue() = 8;		format_options.GetCountValue() = 8;
break;		break;

		case eFormatWchar:
		if (!byte_size_option_set)
		byte_size_value =
		target->GetArchitecture().GetTriple().isOSWindows() ? 2 : 4;
		if (!num_per_line_option_set)
		m_num_per_line = 1;
		if (!count_option_set)
		format_options.GetCountValue() = 8;
		break;
case eFormatBinary:		case eFormatBinary:
case eFormatFloat:		case eFormatFloat:
case eFormatOctal:		case eFormatOctal:
case eFormatDecimal:		case eFormatDecimal:
case eFormatEnum:		case eFormatEnum:
case eFormatUnicode16:		case eFormatUnicode16:
case eFormatUnicode32:		case eFormatUnicode32:
case eFormatUnsigned:		case eFormatUnsigned:
▲ Show 20 Lines • Show All 1,229 Lines • ▼ Show 20 Lines	bool DoExecute(Args &command, CommandReturnObject &result) override {
for (auto &entry : command) {		for (auto &entry : command) {
switch (m_format_options.GetFormat()) {		switch (m_format_options.GetFormat()) {
case kNumFormats:		case kNumFormats:
case eFormatFloat: // TODO: add support for floats soon		case eFormatFloat: // TODO: add support for floats soon
case eFormatCharPrintable:		case eFormatCharPrintable:
case eFormatBytesWithASCII:		case eFormatBytesWithASCII:
case eFormatComplex:		case eFormatComplex:
case eFormatEnum:		case eFormatEnum:
		case eFormatWchar:
case eFormatUnicode16:		case eFormatUnicode16:
case eFormatUnicode32:		case eFormatUnicode32:
case eFormatVectorOfChar:		case eFormatVectorOfChar:
case eFormatVectorOfSInt8:		case eFormatVectorOfSInt8:
case eFormatVectorOfUInt8:		case eFormatVectorOfUInt8:
case eFormatVectorOfSInt16:		case eFormatVectorOfSInt16:
case eFormatVectorOfUInt16:		case eFormatVectorOfUInt16:
case eFormatVectorOfSInt32:		case eFormatVectorOfSInt32:
▲ Show 20 Lines • Show All 369 Lines • Show Last 20 Lines

lldb/source/Core/DumpDataExtractor.cpp

Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines	case eFormatFloat: {
(uint64_t)item_byte_size);		(uint64_t)item_byte_size);
return offset;		return offset;
}		}
ss.flush();		ss.flush();
s->Printf("%s", ss.str().c_str());		s->Printf("%s", ss.str().c_str());
}		}
} break;		} break;

		case eFormatWchar: {
		s->PutChar('L');
		s->PutChar(item_count == 1 ? '\'' : '\"');

		const uint64_t ch = DE.GetMaxU64(&offset, item_byte_size);
		// wchar_t semantics vary across platforms, and this is complicated even
		// more by the fact that the host may not be the same as the target, so to
		// be faithful we would have to follow the target's semantics. For
		// simplicity, just print the character if it's ascii.
		if (ch <= CHAR_MAX && llvm::isPrint(ch))
		s->PutChar(ch);
		else
		s->PutChar(NON_PRINTABLE_CHAR);

		s->PutChar(item_count == 1 ? '\'' : '\"');
		} break;

case eFormatUnicode16:		case eFormatUnicode16:
s->Printf("U+%4.4x", DE.GetU16(&offset));		s->Printf("U+%4.4x", DE.GetU16(&offset));
break;		break;

case eFormatUnicode32:		case eFormatUnicode32:
s->Printf("U+0x%8.8x", DE.GetU32(&offset));		s->Printf("U+0x%8.8x", DE.GetU32(&offset));
break;		break;

▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

lldb/source/Core/ValueObject.cpp

Show First 20 Lines • Show All 1,377 Lines • ▼ Show 20 Lines	if (flags.AnySet(eTypeIsArray \| eTypeIsPointer) &&
(custom_format == eFormatCharPrintable) \|\|		(custom_format == eFormatCharPrintable) \|\|
(custom_format == eFormatComplexFloat) \|\|		(custom_format == eFormatComplexFloat) \|\|
(custom_format == eFormatDecimal) \|\| (custom_format == eFormatHex) \|\|		(custom_format == eFormatDecimal) \|\| (custom_format == eFormatHex) \|\|
(custom_format == eFormatHexUppercase) \|\|		(custom_format == eFormatHexUppercase) \|\|
(custom_format == eFormatFloat) \|\| (custom_format == eFormatOctal) \|\|		(custom_format == eFormatFloat) \|\| (custom_format == eFormatOctal) \|\|
(custom_format == eFormatOSType) \|\|		(custom_format == eFormatOSType) \|\|
(custom_format == eFormatUnicode16) \|\|		(custom_format == eFormatUnicode16) \|\|
(custom_format == eFormatUnicode32) \|\|		(custom_format == eFormatUnicode32) \|\|
		(custom_format == eFormatWchar) \|\|
(custom_format == eFormatUnsigned) \|\|		(custom_format == eFormatUnsigned) \|\|
(custom_format == eFormatPointer) \|\|		(custom_format == eFormatPointer) \|\|
(custom_format == eFormatComplexInteger) \|\|		(custom_format == eFormatComplexInteger) \|\|
(custom_format == eFormatComplex) \|\|		(custom_format == eFormatComplex) \|\|
(custom_format == eFormatDefault)) // use the [] operator		(custom_format == eFormatDefault)) // use the [] operator
return false;		return false;
}		}
}		}
▲ Show 20 Lines • Show All 2,040 Lines • Show Last 20 Lines

lldb/source/DataFormatters/FormatManager.cpp

	Show All 37 Lines
	static FormatInfo g_format_infos[] = {			static FormatInfo g_format_infos[] = {
	{eFormatDefault, '\0', "default"},			{eFormatDefault, '\0', "default"},
	{eFormatBoolean, 'B', "boolean"},			{eFormatBoolean, 'B', "boolean"},
	{eFormatBinary, 'b', "binary"},			{eFormatBinary, 'b', "binary"},
	{eFormatBytes, 'y', "bytes"},			{eFormatBytes, 'y', "bytes"},
	{eFormatBytesWithASCII, 'Y', "bytes with ASCII"},			{eFormatBytesWithASCII, 'Y', "bytes with ASCII"},
	{eFormatChar, 'c', "character"},			{eFormatChar, 'c', "character"},
	{eFormatCharPrintable, 'C', "printable character"},			{eFormatCharPrintable, 'C', "printable character"},
				{eFormatWchar, 'L', "wide character"},
	{eFormatComplexFloat, 'F', "complex float"},			{eFormatComplexFloat, 'F', "complex float"},
	{eFormatCString, 's', "c-string"},			{eFormatCString, 's', "c-string"},
	{eFormatDecimal, 'd', "decimal"},			{eFormatDecimal, 'd', "decimal"},
	{eFormatEnum, 'E', "enumeration"},			{eFormatEnum, 'E', "enumeration"},
	{eFormatHex, 'x', "hex"},			{eFormatHex, 'x', "hex"},
	{eFormatHexUppercase, 'X', "uppercase hex"},			{eFormatHexUppercase, 'X', "uppercase hex"},
	{eFormatFloat, 'f', "float"},			{eFormatFloat, 'f', "float"},
	{eFormatOctal, 'o', "octal"},			{eFormatOctal, 'o', "octal"},
	▲ Show 20 Lines • Show All 1,015 Lines • Show Last 20 Lines

lldb/source/DataFormatters/VectorType.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	static CompilerType GetCompilerTypeForFormat(lldb::Format format,
case lldb::eFormatHex:		case lldb::eFormatHex:
case lldb::eFormatHexUppercase:		case lldb::eFormatHexUppercase:
case lldb::eFormatOctal:		case lldb::eFormatOctal:
return type_system->GetBasicTypeFromAST(lldb::eBasicTypeInt);		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeInt);

case lldb::eFormatHexFloat:		case lldb::eFormatHexFloat:
return type_system->GetBasicTypeFromAST(lldb::eBasicTypeFloat);		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeFloat);

		case lldb::eFormatWchar:
		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeWChar);
case lldb::eFormatUnicode16:		case lldb::eFormatUnicode16:
		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeChar16);
case lldb::eFormatUnicode32:		case lldb::eFormatUnicode32:
		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeChar32);
case lldb::eFormatUnsigned:		case lldb::eFormatUnsigned:
return type_system->GetBasicTypeFromAST(lldb::eBasicTypeUnsignedInt);		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeUnsignedInt);

case lldb::eFormatVectorOfChar:		case lldb::eFormatVectorOfChar:
return type_system->GetBasicTypeFromAST(lldb::eBasicTypeChar);		return type_system->GetBasicTypeFromAST(lldb::eBasicTypeChar);

case lldb::eFormatVectorOfFloat32:		case lldb::eFormatVectorOfFloat32:
return type_system->GetBuiltinTypeForEncodingAndBitSize(eEncodingIEEE754,		return type_system->GetBuiltinTypeForEncodingAndBitSize(eEncodingIEEE754,
▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

lldb/source/Symbol/ClangASTContext.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,251 Lines • ▼ Show 20 Lines	case clang::Type::Builtin:
case clang::BuiltinType::Void:		case clang::BuiltinType::Void:
case clang::BuiltinType::BoundMember:		case clang::BuiltinType::BoundMember:
break;		break;

case clang::BuiltinType::Bool:		case clang::BuiltinType::Bool:
return lldb::eFormatBoolean;		return lldb::eFormatBoolean;
case clang::BuiltinType::Char_S:		case clang::BuiltinType::Char_S:
case clang::BuiltinType::SChar:		case clang::BuiltinType::SChar:
case clang::BuiltinType::WChar_S:
case clang::BuiltinType::Char_U:		case clang::BuiltinType::Char_U:
case clang::BuiltinType::UChar:		case clang::BuiltinType::UChar:
case clang::BuiltinType::WChar_U:
return lldb::eFormatChar;		return lldb::eFormatChar;
		case clang::BuiltinType::WChar_S:
		case clang::BuiltinType::WChar_U:
		return lldb::eFormatWchar;
case clang::BuiltinType::Char16:		case clang::BuiltinType::Char16:
return lldb::eFormatUnicode16;		return lldb::eFormatUnicode16;
case clang::BuiltinType::Char32:		case clang::BuiltinType::Char32:
return lldb::eFormatUnicode32;		return lldb::eFormatUnicode32;
case clang::BuiltinType::UShort:		case clang::BuiltinType::UShort:
return lldb::eFormatUnsigned;		return lldb::eFormatUnsigned;
case clang::BuiltinType::Short:		case clang::BuiltinType::Short:
return lldb::eFormatDecimal;		return lldb::eFormatDecimal;
▲ Show 20 Lines • Show All 4,968 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Fix formatting of wchar, char16, and char32Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 172193

lldb/include/lldb/lldb-enumerations.h

lldb/lit/SymbolFile/NativePDB/globals-fundamental.cpp

lldb/source/Commands/CommandObjectMemory.cpp

lldb/source/Core/DumpDataExtractor.cpp

lldb/source/Core/ValueObject.cpp

lldb/source/DataFormatters/FormatManager.cpp

lldb/source/DataFormatters/VectorType.cpp

lldb/source/Symbol/ClangASTContext.cpp

Fix formatting of wchar, char16, and char32
Needs ReviewPublic