This is an archive of the discontinued LLVM Phabricator instance.

Fix type detection for 'char' variables
ClosedPublic

Authored by tberghammer on Mar 26 2015, 8:53 AM.

Details

Summary

Fix type detection for 'char' variables

A char can have signed and unsigned encoding but previously lldb always
assumed it is signed. This CL adds a logic to detect the encoding of
'char' types based on the default encoding on the target architecture.
It fixes variable printing and expression evaluation on architectures
where 'char' is signed by default.

Diff Detail

Repository
rL LLVM

Event Timeline

tberghammer retitled this revision from to Fix type detection for 'char' when it is unsigned.
tberghammer updated this object.
tberghammer edited the test plan for this revision. (Show Details)
tberghammer added a reviewer: clayborg.
tberghammer added a subscriber: Unknown Object (MLST).
clayborg requested changes to this revision.Mar 26 2015, 9:54 AM
clayborg edited edge metadata.

Shouldn't we return an unsigned char type even if the name says "char"? See inlined comment.

source/Symbol/ClangASTContext.cpp
1023 ↗(On Diff #22726)

Shouldn't this line be:

return ClangASTType (ast, ast->UnsignedCharTy.getAsOpaquePtr());
This revision now requires changes to proceed.Mar 26 2015, 9:54 AM
tberghammer requested a review of this revision.Mar 26 2015, 10:10 AM
tberghammer edited edge metadata.

CharTy will be initialized to the right type (unsigned / signed char) based on LangOpts in ASTContext.{h/cpp}

The problem is that it isn't defined in the C++ spec what is the type of char. If we returning ClangASTType (ast, ast->UnsignedCharTy.getAsOpaquePtr()) then the user will see "unsigned char" as the displayed type when the actual type of the variable is "char" (causing several test failure and "char" and "unsigned char" are two fundamentally different type (they have the same representation but nothing else is common in them).

A very similar logic is also implemented in Line 956-969 for the case when char is signed.

Just a drive by comment: You mention "causing several test failure". Does your patch fix them? If yes, then are they currently skipped or marked xfail? If yes again, you should probably enable them in this patch.

At the very least, it will be helpful if you can give an example of a failing test.

Just a drive by comment: You mention "causing several test failure". Does your patch fix them? If yes, then are they currently skipped or marked xfail? If yes again, you should probably enable them in this patch.

At the very least, it will be helpful if you can give an example of a failing test.

This is causing failures on android-aarch64 what is currently not in the state where we want to start XFAIL-ing tests. The issue effects multiple test cases (e.g.: TestCStrings, TestFunctionTypes) but all of them have some different (unrelated) issue also as android-aarch64 is still not in a good shape. This CL and D8634 will fix e.g.: TestCStrings on android aarch64 (and about 20 other failures where I haven't checked which CL makes the difference).

clayborg requested changes to this revision.Mar 26 2015, 11:13 AM
clayborg edited edge metadata.

So we really don't want this fix. If there are tests that are expecting "char" to show up, those tests should be fixed to deal unsigned chars. We really want any variable that is actually unsigned to be represented as unsigned in the clang::ASTContext objects we create for each shared library otherwise if you do expressions with those values, they might behave differently from how they would behave if you compiled them. There is no data in the DWARF that tells us that "char" should be treated as unsigned, so we can't create our ASTContext objects that way. So we need to represent variables correctly. So the tests should be fixed or conditional based on the platform and this fix should either turn into fixing the tests, or abandoned.

This revision now requires changes to proceed.Mar 26 2015, 11:13 AM
tberghammer requested a review of this revision.Mar 26 2015, 11:50 AM
tberghammer edited edge metadata.

If LangOpts.CharIsSigned is correctly set in ASTContext then CharTy will be set to BuiltinType::Char_S or to BuiltinType::Char_U based on what is used in the target and BuiltinType::Char_S behaves exactly the same way as BuiltinType::SChar (what is used for SignedCharTy) expect that they use different name when displayed to the user (if they behave differently then that one is a bug in Clang and should be fixed). The same is true for BuiltinType::Char_U and for UnsignedCharTy.

Based on it I still believe if the dwarf file contains a basic type with name "char" and it is a DW_ATE_unsigned_char then CharTy will be initialized to Char_U and everything will work fine and the user will see the type name defined in the dwarf file ("char"). If you want I can add a check (possibly an assert) to verify that we hit this case only when ASTContext.LangOpts.CharIsSigned is false (if it is not true then the dwarf file or lldb is broken).

We are running into the case where we setup a default clang::ASTContext for every shared library. We could argue that as soon as you create the clang::ClangAST that you must then search for a "char" type by name and see if it is unsigned and if so, modify the clang::ASTContext. But we currently try to parse things only when needed in LLDB so I don't want to have to pull in all of the DWARF accelerator tables and load a type just to find out if chars should be signed. Why? Because on MacOSX we use DWARF in .o files and one shared library might have 1000 .o files. Now we try to make the clang::ASTContext for this shared library and we must pull in all 1000 .o files just to check if any have a unsigned char.

So I would vote to continue with the current functionality and just call "char" that are actually unsigned "unsigned char" and fix the test that have problems with that.

I agree that we don't want to parse any unnecessary dwarf information, but my problem is that the current implementation is inconsistent with the C++ standard. Currently we use "char" only in the case when it is a signed type (I think this is the more common case), but it is just an arbitrary choice as the standard don't specify it (leave it to the implementation). The problem what you are describing is already in the code for the case when "char" is signed as it is using almost the same logic what I plan to add here (Line 956-969).

I think if the dwarf file say that the a variable have a type named "char" then we should treat it as CharTy and ensure that it will be initialized correctly. If we reach this point then we are parsing some dwarf information about a "char" so clang ASTContext should contain the right value or if not then we should update it accordingly (not sure if possible to update it in ClangASTContext).

Fixing the test cases are a possible option but I really hate that idea as the test cases are correct and the implementation is broken. Your suggestion (using "unsigned char") will work in most of the case but it can be a bit confusing for the user when we display wrong type information and can break expression evaluation in the following (edge) case:

Given this program (perfectly valid C++):

#include <cstdio>

void foo(char c) {
  printf("normal\n");
}

void foo(unsigned char c) {
  printf("unsigned\n");
}

void foo(signed char c) {
  printf("signed\n");
}

int main() {
  char c = 0;
  unsigned char uc = 0;
  signed char sc = 0;
  foo(c);
  foo(uc);
  foo(sc);
}

Stop at the end and evaluate the following expressions on a machine where char is an unsigned type (will use UnsignedCharTy based on your suggestion):

foo(c);
foo(uc);
foo(sc);

The first one will output "unsigned" while it should output normal (it will work as intended when char is signed because of the logic in Line 965-969). I think the only way to fix it is to use the real type of the variable what is char (CharTy)
(P.S.: Haven't tested it but pretty sure this is what will happen)

What is your opinion about setting up CharTy based on the target triple with the same logic what clang use in Driver/Tools.cpp::isSignedCharDefault and then when we hit a case where the type of the char in the dwarf isn't match with our setting then we fall back to the fully qualified "signed char" or "unsigned char" based on the information in the dwarf.

This solution will work perfectly when the inferior use the same char type as the default for the architecture and fall back to an other type what is at least as good as the current implementation if the inferior have non-default char type.

Setting the value based on the triple is a fine solution.

tberghammer planned changes to this revision.Mar 27 2015, 7:21 AM
tberghammer retitled this revision from Fix type detection for 'char' when it is unsigned to Fix type detection for 'char' variables.
tberghammer updated this object.
tberghammer edited the test plan for this revision. (Show Details)
tberghammer edited edge metadata.

Update Cl based on discussion in the review thread

clayborg requested changes to this revision.Mar 27 2015, 10:17 AM
clayborg edited edge metadata.

It seems like all of the code in LanguageOptions could just be a method on ArchSpec instead of adding a new file since this is really just switching off of a triple?

lldb.xcodeproj/project.pbxproj
749 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

2378–2379 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

3476–3478 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

6319 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

source/Expression/ClangExpressionParser.cpp
73–74 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

233–234 ↗(On Diff #22802)
m_compiler->getLangOpts().CharIsSigned = ArchSpec(m_compiler->getTargetOpts().Triple.c_str()).CharIsSigned();
source/Symbol/ClangASTContext.cpp
75–76 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

235 ↗(On Diff #22802)

This would become:

ArchSpec arch(triple);
Opts.CharIsSigned = arch.CharIsSigned();
source/Utility/CMakeLists.txt
9 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

source/Utility/LanguageOptions.cpp
1–43 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

source/Utility/LanguageOptions.h
1–23 ↗(On Diff #22802)

Remove this and just add a method to lldb_private::ArchSpec.

This revision now requires changes to proceed.Mar 27 2015, 10:17 AM
tberghammer edited edge metadata.

Move CharIsSignedByDefault to ArchSpec

clayborg accepted this revision.Mar 30 2015, 9:55 AM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Mar 30 2015, 9:55 AM
This revision was automatically updated to reflect the committed changes.