This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
source/Plugins/TypeSystem/Clang/
-
Plugins/
-
TypeSystem/
-
Clang/
-
TypeSystemClang.cpp
-
test/API/commands/expression/char/
-
API/
-
commands/
-
expression/
-
char/
-
TestExprsChar.py
-
main.cpp

Differential D136011

[lldb] Don't check environment default char signedness when creating clang type for "char"
ClosedPublic

Authored by aeubanks on Oct 14 2022, 9:10 PM.

Download Raw Diff

Details

Reviewers

dblaikie
Michael137
JDevlieghere
DavidSpickett
labath

Commits

rGba8ded6820fa: [lldb] Don't check environment default char signedness when creating clang type…

Summary

With -f(un)signed-char, the die corresponding to "char" may be the opposite DW_ATE_(un)signed_char from the default platform signedness.
Ultimately we should determine whether a type is the unspecified signedness char by looking if its name is "char" (as opposed to "signed char"/"unsigned char") and not care about DW_ATE_(un)signed_char matching the platform default.

Fixes #23443

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aeubanks created this revision.Oct 14 2022, 9:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 9:10 PM

aeubanks requested review of this revision.Oct 14 2022, 9:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 9:10 PM

Herald added a subscriber: lldb-commits. · View Herald Transcript

not 100% sure this is the right fix

Harbormaster completed remote builds in B192328: Diff 467993.Oct 14 2022, 9:27 PM

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

I'm curious why there were all those expected failures re: PR23069. Were they not using the default char signedness? Or is the test using explicit signedness, and so whatever platforms happen not to have the explicit value sa their default are/were failing?

update test

Harbormaster completed remote builds in B192431: Diff 468118.Oct 16 2022, 9:47 PM

With -f(un)signed-char, the die corresponding to "char" may be the wrong DW_ATE_(un)signed_char.

As the producer of the DWARF (so, clang for example) is this correct by the existing rules?

As I understand it so far, if the compiler is using "char" (no sign chosen) it can use either DW_ATE_unsigned_char or DW_ATE_signed_char. So lldb cannot trust the choice the compiler made to tell it what plain char signedness should be?

In D136011#3860637, @dblaikie wrote:

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

Yes, I'm pretty sure that will happen. OTOH, I don't think there's any value to fix this in a completely satisfactory way. Like, if the whole program was consistently with the non-default signedness, we could try to detect it and then configure the internal AST defaults accordingly. But that's hard to detect, and I'd be surprised if most programs are completely homogeneous like this.

So, overall, I quite like this fix.

I'm curious why there were all those expected failures re: PR23069. Were they not using the default char signedness? Or is the test using explicit signedness, and so whatever platforms happen not to have the explicit value sa their default are/were failing?

Yes, I'm pretty sure that's the case.

In D136011#3861792, @DavidSpickett wrote:

With -f(un)signed-char, the die corresponding to "char" may be the wrong DW_ATE_(un)signed_char.

As the producer of the DWARF (so, clang for example) is this correct by the existing rules?

Yes, because DWARF never expected people will be round-tripping the debug info back into C(++). As far as it is concerned, it has correctly given you the signedness of the type.

aeubanks edited the summary of this revision. (Show Details)Oct 18 2022, 4:39 PM

In D136011#3862150, @labath wrote:

In D136011#3860637, @dblaikie wrote:

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

Yes, I'm pretty sure that will happen. OTOH, I don't think there's any value to fix this in a completely satisfactory way. Like, if the whole program was consistently with the non-default signedness, we could try to detect it and then configure the internal AST defaults accordingly. But that's hard to detect, and I'd be surprised if most programs are completely homogeneous like this.

So, overall, I quite like this fix.

Maybe looking for a "char" entry and setting ast.getLangOpts().CharIsSigned from it would probably work in most cases, but not sure that it's worth the effort. Rendering the wrong signedness for chars doesn't seem like the worst thing, and this patch improves things (as the removed expectedFailureAll show, plus this helps with some testing of lldb I've been doing). I can add a TODO if you'd like.

I'm curious why there were all those expected failures re: PR23069. Were they not using the default char signedness? Or is the test using explicit signedness, and so whatever platforms happen not to have the explicit value sa their default are/were failing?

Yes, I'm pretty sure that's the case.

In D136011#3861792, @DavidSpickett wrote:

With -f(un)signed-char, the die corresponding to "char" may be the wrong DW_ATE_(un)signed_char.

As the producer of the DWARF (so, clang for example) is this correct by the existing rules?

Yes, because DWARF never expected people will be round-tripping the debug info back into C(++). As far as it is concerned, it has correctly given you the signedness of the type.

"wrong" is inaccurate, I've updated the summary

In D136011#3862150, @labath wrote:

In D136011#3860637, @dblaikie wrote:

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

Yes, I'm pretty sure that will happen. OTOH, I don't think there's any value to fix this in a completely satisfactory way. Like, if the whole program was consistently with the non-default signedness, we could try to detect it and then configure the internal AST defaults accordingly. But that's hard to detect, and I'd be surprised if most programs are completely homogeneous like this.

So, overall, I quite like this fix.

Yeah, this line of reasoning (non-homogenaeity) is one I'm on board with, thanks for the framing. Basically I/we can think of the debugger's expression context as some code that's compiled with the default char signedness always. Since char signedness isn't part of the ABI, bits of the program can be built with one and bits with the other - and the debugger is just a bit that's built with the default.

Works for me. (though lldb's sufficiently out of my wheelhouse I'll leave it to others to approve)

In D136011#3866854, @aeubanks wrote:

In D136011#3862150, @labath wrote:

Maybe looking for a "char" entry and setting ast.getLangOpts().CharIsSigned from it would probably work in most cases, but not sure that it's worth the effort. Rendering the wrong signedness for chars doesn't seem like the worst thing, and this patch improves things (as the removed expectedFailureAll show, plus this helps with some testing of lldb I've been doing). I can add a TODO if you'd like.

It's not just a question of rendering. This can result in wrong/unexpected results of expressions as well. Something as simple as p (int)unspecified_char_value can return a different result depending on whether unspecified_char_value is considered signed or unsigned.
However, I am not convinced that we would be better off trying to detect this would, as that detection can fail as well, only in more unpredictable ways.

In D136011#3868755, @dblaikie wrote:

Yeah, this line of reasoning (non-homogenaeity) is one I'm on board with, thanks for the framing. Basically I/we can think of the debugger's expression context as some code that's compiled with the default char signedness always. Since char signedness isn't part of the ABI, bits of the program can be built with one and bits with the other - and the debugger is just a bit that's built with the default.

Are you sure about the ABI part? I guess it may depend on how you define the ABI.. It definitely does not affect the calling conventions, but I'm pretty sure it would be an ODR violation if you even had a simple function like int to_int(char c) { return c; } in a header, and compiled it with both -fsigned and -funsigned-char. Not that this will stop people from doing it, but the most likely scenario for this is that your linking your application with the C library compiled in a different mode, where it is kinda ok, because the C library does not have many inline functions, and doesn't do much char manipulation.

This revision is now accepted and ready to land.Oct 20 2022, 4:43 AM

Closed by commit rGba8ded6820fa: [lldb] Don't check environment default char signedness when creating clang type… (authored by aeubanks). · Explain WhyOct 20 2022, 3:04 PM

This revision was automatically updated to reflect the committed changes.

aeubanks added a commit: rGba8ded6820fa: [lldb] Don't check environment default char signedness when creating clang type….

In D136011#3870677, @labath wrote:

In D136011#3866854, @aeubanks wrote:

In D136011#3862150, @labath wrote:

Maybe looking for a "char" entry and setting ast.getLangOpts().CharIsSigned from it would probably work in most cases, but not sure that it's worth the effort. Rendering the wrong signedness for chars doesn't seem like the worst thing, and this patch improves things (as the removed expectedFailureAll show, plus this helps with some testing of lldb I've been doing). I can add a TODO if you'd like.

It's not just a question of rendering. This can result in wrong/unexpected results of expressions as well. Something as simple as p (int)unspecified_char_value can return a different result depending on whether unspecified_char_value is considered signed or unsigned.
However, I am not convinced that we would be better off trying to detect this would, as that detection can fail as well, only in more unpredictable ways.

In D136011#3868755, @dblaikie wrote:

Yeah, this line of reasoning (non-homogenaeity) is one I'm on board with, thanks for the framing. Basically I/we can think of the debugger's expression context as some code that's compiled with the default char signedness always. Since char signedness isn't part of the ABI, bits of the program can be built with one and bits with the other - and the debugger is just a bit that's built with the default.

Are you sure about the ABI part? I guess it may depend on how you define the ABI.. It definitely does not affect the calling conventions, but I'm pretty sure it would be an ODR violation if you even had a simple function like int to_int(char c) { return c; } in a header, and compiled it with both -fsigned and -funsigned-char. Not that this will stop people from doing it, but the most likely scenario for this is that your linking your application with the C library compiled in a different mode, where it is kinda ok, because the C library does not have many inline functions, and doesn't do much char manipulation.

"ODR violation" gets a bit fuzzy - but yeah, you'd get a different int out of that function. My point was since apps would already end up with a mix of signed and unsigned most likely, the debugger expression evaluation is one of those things mixed one way rather than the other.

But yeah, it's all a bit awkward (& probably will be more for googlers where everything's built with the non-default signedness - and now the expression evaluator will do the other thing/not that - but hopefully a rather small number of users ever notice or care about this). I guess char buffers for raw/binary data will be a bit more annoying to look at, dumped as signed integers instead of unsigned...

Anyway, yeah, it's all a bit awkward but this seems like the best thing for now. Thanks for weighing in!

Revision Contents

Path

Size

lldb/

source/

Plugins/

TypeSystem/

Clang/

TypeSystemClang.cpp

4 lines

test/

API/

commands/

expression/

char/

TestExprsChar.py

21 lines

main.cpp

4 lines

Diff 469367

lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,057 Lines • ▼ Show 20 Lines	if (QualTypeMatchesBitSize(bit_size, ast, ast.LongTy))
return GetType(ast.LongTy);		return GetType(ast.LongTy);
if (QualTypeMatchesBitSize(bit_size, ast, ast.LongLongTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.LongLongTy))
return GetType(ast.LongLongTy);		return GetType(ast.LongLongTy);
if (QualTypeMatchesBitSize(bit_size, ast, ast.Int128Ty))		if (QualTypeMatchesBitSize(bit_size, ast, ast.Int128Ty))
return GetType(ast.Int128Ty);		return GetType(ast.Int128Ty);
break;		break;

case DW_ATE_signed_char:		case DW_ATE_signed_char:
if (ast.getLangOpts().CharIsSigned && type_name == "char") {		if (type_name == "char") {
if (QualTypeMatchesBitSize(bit_size, ast, ast.CharTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.CharTy))
return GetType(ast.CharTy);		return GetType(ast.CharTy);
}		}
if (QualTypeMatchesBitSize(bit_size, ast, ast.SignedCharTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.SignedCharTy))
return GetType(ast.SignedCharTy);		return GetType(ast.SignedCharTy);
break;		break;

case DW_ATE_unsigned:		case DW_ATE_unsigned:
Show All 35 Lines	if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedLongTy))
return GetType(ast.UnsignedLongTy);		return GetType(ast.UnsignedLongTy);
if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedLongLongTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedLongLongTy))
return GetType(ast.UnsignedLongLongTy);		return GetType(ast.UnsignedLongLongTy);
if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedInt128Ty))		if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedInt128Ty))
return GetType(ast.UnsignedInt128Ty);		return GetType(ast.UnsignedInt128Ty);
break;		break;

case DW_ATE_unsigned_char:		case DW_ATE_unsigned_char:
if (!ast.getLangOpts().CharIsSigned && type_name == "char") {		if (type_name == "char") {
if (QualTypeMatchesBitSize(bit_size, ast, ast.CharTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.CharTy))
return GetType(ast.CharTy);		return GetType(ast.CharTy);
}		}
if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedCharTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedCharTy))
return GetType(ast.UnsignedCharTy);		return GetType(ast.UnsignedCharTy);
if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedShortTy))		if (QualTypeMatchesBitSize(bit_size, ast, ast.UnsignedShortTy))
return GetType(ast.UnsignedShortTy);		return GetType(ast.UnsignedShortTy);
break;		break;
▲ Show 20 Lines • Show All 8,886 Lines • Show Last 20 Lines

lldb/test/API/commands/expression/char/TestExprsChar.py

	import lldb			import lldb
	from lldbsuite.test.decorators import *			from lldbsuite.test.decorators import *
	from lldbsuite.test.lldbtest import *			from lldbsuite.test.lldbtest import *
	from lldbsuite.test import lldbutil			from lldbsuite.test import lldbutil

	class ExprCharTestCase(TestBase):			class ExprCharTestCase(TestBase):

	def do_test(self, dictionary=None):			def do_test(self, dictionary=None):
	"""These basic expression commands should work as expected."""			"""These basic expression commands should work as expected."""
	self.build(dictionary=dictionary)			self.build(dictionary=dictionary)

	lldbutil.run_to_source_breakpoint(self, '// Break here', lldb.SBFileSpec("main.cpp"))			lldbutil.run_to_source_breakpoint(self, '// Break here', lldb.SBFileSpec("main.cpp"))

	self.expect_expr("foo(c)", result_value="1")			self.expect_expr("foo(c)", result_value="1")
	self.expect_expr("foo(sc)", result_value="2")			self.expect_expr("foo(sc)", result_value="2")
	self.expect_expr("foo(uc)", result_value="3")			self.expect_expr("foo(uc)", result_value="3")
				self.expect_expr("g", result_type="char")
				self.expect_expr("gs", result_type="signed char")
				self.expect_expr("gu", result_type="unsigned char")

	def test_default_char(self):			def test_default_char(self):
	self.do_test()			self.do_test()

	@skipIf(oslist=["linux"], archs=["aarch64", "arm"], bugnumber="llvm.org/pr23069")
	@expectedFailureAll(
	archs=[
	"powerpc64le",
	"s390x"],
	bugnumber="llvm.org/pr23069")
	def test_signed_char(self):			def test_signed_char(self):
	self.do_test(dictionary={'CFLAGS_EXTRAS': '-fsigned-char'})			self.do_test(dictionary={'CFLAGS_EXTRAS': '-fsigned-char'})

	@expectedFailureAll(
	archs=[
	"i[3-6]86",
	"x86_64",
	"arm64",
	'arm64e',
	'armv7',
	'armv7k',
	'arm64_32'],
	bugnumber="llvm.org/pr23069, <rdar://problem/28721938>")
	@expectedFailureAll(triple='mips*', bugnumber="llvm.org/pr23069")
	@expectedFailureAll(oslist=['windows'], archs=['aarch64'], bugnumber="llvm.org/pr23069")
	def test_unsigned_char(self):			def test_unsigned_char(self):
	self.do_test(dictionary={'CFLAGS_EXTRAS': '-funsigned-char'})			self.do_test(dictionary={'CFLAGS_EXTRAS': '-funsigned-char'})

lldb/test/API/commands/expression/char/main.cpp

	#include <stdio.h>			#include <stdio.h>

				char g = 0;
				signed char gs = 0;
				unsigned char gu = 0;

	int foo(char c) { return 1; }			int foo(char c) { return 1; }
	int foo(signed char c) { return 2; }			int foo(signed char c) { return 2; }
	int foo(unsigned char c) { return 3; }			int foo(unsigned char c) { return 3; }

	int main() {			int main() {
	char c = 0;			char c = 0;
	signed char sc = 0;			signed char sc = 0;
	unsigned char uc = 0;			unsigned char uc = 0;
	printf("%d %d %d\n", foo(c), foo(sc), foo(uc));			printf("%d %d %d\n", foo(c), foo(sc), foo(uc));
	return 0; // Break here			return 0; // Break here
	}			}