This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Don't check environment default char signedness when creating clang type for "char"
ClosedPublic

Authored by aeubanks on Oct 14 2022, 9:10 PM.

Details

Summary

With -f(un)signed-char, the die corresponding to "char" may be the opposite DW_ATE_(un)signed_char from the default platform signedness.
Ultimately we should determine whether a type is the unspecified signedness char by looking if its name is "char" (as opposed to "signed char"/"unsigned char") and not care about DW_ATE_(un)signed_char matching the platform default.

Fixes #23443

Diff Detail

Event Timeline

aeubanks created this revision.Oct 14 2022, 9:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 9:10 PM
aeubanks requested review of this revision.Oct 14 2022, 9:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 9:10 PM

not 100% sure this is the right fix

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

I'm curious why there were all those expected failures re: PR23069. Were they not using the default char signedness? Or is the test using explicit signedness, and so whatever platforms happen not to have the explicit value sa their default are/were failing?

aeubanks updated this revision to Diff 468118.Oct 16 2022, 9:32 PM

update test

With -f(un)signed-char, the die corresponding to "char" may be the wrong DW_ATE_(un)signed_char.

As the producer of the DWARF (so, clang for example) is this correct by the existing rules?

As I understand it so far, if the compiler is using "char" (no sign chosen) it can use either DW_ATE_unsigned_char or DW_ATE_signed_char. So lldb cannot trust the choice the compiler made to tell it what plain char signedness should be?

labath added a subscriber: labath.Oct 17 2022, 6:45 AM

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

Yes, I'm pretty sure that will happen. OTOH, I don't think there's any value to fix this in a completely satisfactory way. Like, if the whole program was consistently with the non-default signedness, we could try to detect it and then configure the internal AST defaults accordingly. But that's hard to detect, and I'd be surprised if most programs are completely homogeneous like this.

So, overall, I quite like this fix.

I'm curious why there were all those expected failures re: PR23069. Were they not using the default char signedness? Or is the test using explicit signedness, and so whatever platforms happen not to have the explicit value sa their default are/were failing?

Yes, I'm pretty sure that's the case.

With -f(un)signed-char, the die corresponding to "char" may be the wrong DW_ATE_(un)signed_char.

As the producer of the DWARF (so, clang for example) is this correct by the existing rules?

Yes, because DWARF never expected people will be round-tripping the debug info back into C(++). As far as it is concerned, it has correctly given you the signedness of the type.

aeubanks edited the summary of this revision. (Show Details)Oct 18 2022, 4:39 PM

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

Yes, I'm pretty sure that will happen. OTOH, I don't think there's any value to fix this in a completely satisfactory way. Like, if the whole program was consistently with the non-default signedness, we could try to detect it and then configure the internal AST defaults accordingly. But that's hard to detect, and I'd be surprised if most programs are completely homogeneous like this.

So, overall, I quite like this fix.

Maybe looking for a "char" entry and setting ast.getLangOpts().CharIsSigned from it would probably work in most cases, but not sure that it's worth the effort. Rendering the wrong signedness for chars doesn't seem like the worst thing, and this patch improves things (as the removed expectedFailureAll show, plus this helps with some testing of lldb I've been doing). I can add a TODO if you'd like.

I'm curious why there were all those expected failures re: PR23069. Were they not using the default char signedness? Or is the test using explicit signedness, and so whatever platforms happen not to have the explicit value sa their default are/were failing?

Yes, I'm pretty sure that's the case.

With -f(un)signed-char, the die corresponding to "char" may be the wrong DW_ATE_(un)signed_char.

As the producer of the DWARF (so, clang for example) is this correct by the existing rules?

Yes, because DWARF never expected people will be round-tripping the debug info back into C(++). As far as it is concerned, it has correctly given you the signedness of the type.

"wrong" is inaccurate, I've updated the summary

I think the place where this will go wrong is in terms of how lldb renders char values on non-default-char-signedness programs (it'll render them as the default-char-signedness, which might be confusing to a user - since they'll be looking at literals, etc, that are the other signedness) and how lldb will interpret char literals (though that'll already be wrong - since the literals are already being parsed with the default-char-signedness, I think).

Yes, I'm pretty sure that will happen. OTOH, I don't think there's any value to fix this in a completely satisfactory way. Like, if the whole program was consistently with the non-default signedness, we could try to detect it and then configure the internal AST defaults accordingly. But that's hard to detect, and I'd be surprised if most programs are completely homogeneous like this.

So, overall, I quite like this fix.

Yeah, this line of reasoning (non-homogenaeity) is one I'm on board with, thanks for the framing. Basically I/we can think of the debugger's expression context as some code that's compiled with the default char signedness always. Since char signedness isn't part of the ABI, bits of the program can be built with one and bits with the other - and the debugger is just a bit that's built with the default.

Works for me. (though lldb's sufficiently out of my wheelhouse I'll leave it to others to approve)

labath accepted this revision.Oct 20 2022, 4:43 AM

Maybe looking for a "char" entry and setting ast.getLangOpts().CharIsSigned from it would probably work in most cases, but not sure that it's worth the effort. Rendering the wrong signedness for chars doesn't seem like the worst thing, and this patch improves things (as the removed expectedFailureAll show, plus this helps with some testing of lldb I've been doing). I can add a TODO if you'd like.

It's not just a question of rendering. This can result in wrong/unexpected results of expressions as well. Something as simple as p (int)unspecified_char_value can return a different result depending on whether unspecified_char_value is considered signed or unsigned.
However, I am not convinced that we would be better off trying to detect this would, as that detection can fail as well, only in more unpredictable ways.

Yeah, this line of reasoning (non-homogenaeity) is one I'm on board with, thanks for the framing. Basically I/we can think of the debugger's expression context as some code that's compiled with the default char signedness always. Since char signedness isn't part of the ABI, bits of the program can be built with one and bits with the other - and the debugger is just a bit that's built with the default.

Are you sure about the ABI part? I guess it may depend on how you define the ABI.. It definitely does not affect the calling conventions, but I'm pretty sure it would be an ODR violation if you even had a simple function like int to_int(char c) { return c; } in a header, and compiled it with both -fsigned and -funsigned-char. Not that this will stop people from doing it, but the most likely scenario for this is that your linking your application with the C library compiled in a different mode, where it is kinda ok, because the C library does not have many inline functions, and doesn't do much char manipulation.

This revision is now accepted and ready to land.Oct 20 2022, 4:43 AM

Maybe looking for a "char" entry and setting ast.getLangOpts().CharIsSigned from it would probably work in most cases, but not sure that it's worth the effort. Rendering the wrong signedness for chars doesn't seem like the worst thing, and this patch improves things (as the removed expectedFailureAll show, plus this helps with some testing of lldb I've been doing). I can add a TODO if you'd like.

It's not just a question of rendering. This can result in wrong/unexpected results of expressions as well. Something as simple as p (int)unspecified_char_value can return a different result depending on whether unspecified_char_value is considered signed or unsigned.
However, I am not convinced that we would be better off trying to detect this would, as that detection can fail as well, only in more unpredictable ways.

Yeah, this line of reasoning (non-homogenaeity) is one I'm on board with, thanks for the framing. Basically I/we can think of the debugger's expression context as some code that's compiled with the default char signedness always. Since char signedness isn't part of the ABI, bits of the program can be built with one and bits with the other - and the debugger is just a bit that's built with the default.

Are you sure about the ABI part? I guess it may depend on how you define the ABI.. It definitely does not affect the calling conventions, but I'm pretty sure it would be an ODR violation if you even had a simple function like int to_int(char c) { return c; } in a header, and compiled it with both -fsigned and -funsigned-char. Not that this will stop people from doing it, but the most likely scenario for this is that your linking your application with the C library compiled in a different mode, where it is kinda ok, because the C library does not have many inline functions, and doesn't do much char manipulation.

"ODR violation" gets a bit fuzzy - but yeah, you'd get a different int out of that function. My point was since apps would already end up with a mix of signed and unsigned most likely, the debugger expression evaluation is one of those things mixed one way rather than the other.

But yeah, it's all a bit awkward (& probably will be more for googlers where everything's built with the non-default signedness - and now the expression evaluator will do the other thing/not that - but hopefully a rather small number of users ever notice or care about this). I guess char buffers for raw/binary data will be a bit more annoying to look at, dumped as signed integers instead of unsigned...

Anyway, yeah, it's all a bit awkward but this seems like the best thing for now. Thanks for weighing in!