This is an archive of the discontinued LLVM Phabricator instance.

Fix how ValueObject deals with getting unsigned values
ClosedPublic

Authored by shafik on Dec 16 2020, 2:57 PM.

Details

Summary

Currently ValueObject::GetValueAsUnsigned(...) calls ResolveValue(...) and then afterward marks the scalar as unsigned.

This approach fails when dealing with 1 bit BOOL bit-fields in ObjC since the call to scalar.ExtractBitfield(...) will end up sign extending the value. So in the case of a bit-field with a value of 1 we would instead end up with a value of 255 (since BOOL is a just unsigned char).

I adjusted the handling to check if the scalar is unsigned in ResolveValue(...) and handle accordingly.

Diff Detail

Event Timeline

shafik requested review of this revision.Dec 16 2020, 2:57 PM
shafik created this revision.
shafik added a subscriber: lldb-commits.

I am not sure if I need to worry about ValueObject::IsLogicalTrue(...) as well.

This seems suspicious to me, on a couple of levels. You claim that BOOL is an unsigned type, but the apple documentation says "BOOL is explicitly signed so @encode(BOOL) is c rather than C even if -funsigned-char is used." Which one is true? Could this have something to do with the fact that the documentation assumes obj-c (which does not have a native bool type) but in lldb, we use obj-c++ (which, I guess, inherits bool from c++) everywhere?

If BOOL is unsigned, then it's not clear to me why would Scalar::ExtractBitfield be sign-extending anything. OTOH, if it is _signed_, then sign-extension seems to be the right behavior.

I think we may have some deeper problems with the handling bitfields whose underlying types are 1 byte long. Lldb (I think, correctly) sign-extends all signed bitfields, except the signed char one:

(lldb) p a
(A) $0 = {
  bool_ = true
  signed_char = '\x01'
  signed_short = -1
  signed_int = -1
  unsigned_char = '\x01'
  unsigned_short = 1
  unsigned_int = 1
}
(lldb) p/x a
(A) $1 = {
  bool_ = 0x01
  signed_char = 0x01
  signed_short = 0x0001
  signed_int = 0x00000001
  unsigned_char = 0x01
  unsigned_short = 0x0001
  unsigned_int = 0x00000001
}

This patch does not have any impact on this behavior, but the fact that signed char comes out as '\x01' seems suspicious. This is what gdb does for the same input:

(gdb) p a
$1 = {bool_ = true, signed_char = -1 '\377', signed_short = -1, signed_int = -1, unsigned_char = 1 '\001', unsigned_short = 1, unsigned_int = 1}
(gdb) p/x a
$2 = {bool_ = 0x1, signed_char = 0xff, signed_short = 0xffff, signed_int = 0xffffffff, unsigned_char = 0x1, unsigned_short = 0x1, unsigned_int = 0x1}

Finally, even though I'm not sure this is the right solution, I am sure that GetValueAsSigned should get the same fix as GetValueAsUnsigned

teemperor added a comment.EditedDec 17 2020, 4:32 AM

Whether BOOL is signed char or bool is apparently depending on the architecture (???). See ClangExpressionSourceCode::GetText, Clang's __OBJC_BOOL_IS_BOOL macro and then there is also some preprocessor shenanigans for non-Clang ObjC (?) in the objc.h header from what I can see. Whether C++ is active is never actually checked anywhere (the bool type is coming from stdbool.h which is unconditionally included by objc.h).

I think this patch just gets the formatter to print 'YES' because the formatter only knows 0 and 1 as valid values, but when we (correctly) sign-extend the 1-bit BOOL (=signed char) to 255, it just does it's fall-back logic of printing the integer value (also, that fall-back logic is completely ignoring the specified int format and just always prints as decimal. That seems like a bug...)

Anyway, the real solution is that the current behaviour is actually ... *drum roll* correct:

$ cat bool.m
#import <Foundation/Foundation.h>

typedef struct {
    BOOL fieldOne : 1;
    BOOL fieldTwo : 1;
    BOOL fieldThree : 1;
    BOOL fieldFour : 1;
    BOOL fieldfive : 1;
} BoolBitFields;

int main (int argc, const char * argv[]) {

  BoolBitFields myField = {0};
  myField.fieldTwo = YES;
  if (myField.fieldTwo == YES)
    printf("YES\n");
  else
    printf("NO\n");

  return 0;
}
$ ~/test clang bool.m -o bool -framework Foundation -Wpedantic -Wall -Wextra; and ./bool
NO

That bitfield is just not 'YES' but '255' which is neither YES nor NO. Also there is no Clang warning for this, so this is apparently very cool and very legal code. I am tempted to say the real fix here is to tell people to just not use Objective-C...

This seems suspicious to me, on a couple of levels. You claim that BOOL is an unsigned type, but the apple documentation says "BOOL is explicitly signed so @encode(BOOL) is c rather than C even if -funsigned-char is used." Which one is true? Could this have something to do with the fact that the documentation assumes obj-c (which does not have a native bool type) but in lldb, we use obj-c++ (which, I guess, inherits bool from c++) everywhere?

If BOOL is unsigned, then it's not clear to me why would Scalar::ExtractBitfield be sign-extending anything. OTOH, if it is _signed_, then sign-extension seems to be the right behavior.

I think we may have some deeper problems with the handling bitfields whose underlying types are 1 byte long. Lldb (I think, correctly) sign-extends all signed bitfields, except the signed char one:

(lldb) p a
(A) $0 = {
  bool_ = true
  signed_char = '\x01'
  signed_short = -1
  signed_int = -1
  unsigned_char = '\x01'
  unsigned_short = 1
  unsigned_int = 1
}
(lldb) p/x a
(A) $1 = {
  bool_ = 0x01
  signed_char = 0x01
  signed_short = 0x0001
  signed_int = 0x00000001
  unsigned_char = 0x01
  unsigned_short = 0x0001
  unsigned_int = 0x00000001
}

This patch does not have any impact on this behavior, but the fact that signed char comes out as '\x01' seems suspicious. This is what gdb does for the same input:

(gdb) p a
$1 = {bool_ = true, signed_char = -1 '\377', signed_short = -1, signed_int = -1, unsigned_char = 1 '\001', unsigned_short = 1, unsigned_int = 1}
(gdb) p/x a
$2 = {bool_ = 0x1, signed_char = 0xff, signed_short = 0xffff, signed_int = 0xffffffff, unsigned_char = 0x1, unsigned_short = 0x1, unsigned_int = 0x1}

Finally, even though I'm not sure this is the right solution, I am sure that GetValueAsSigned should get the same fix as GetValueAsUnsigned

DumpDataExtractor doesn't know what types its printing, it just gets a lldb_private::Format (where we sometimes encode whether a type is signed). That is on my TODO list, but I just didn't get around to refactor this yet.

In short: If you get a eFormatChar (which is the default for 'char' stuff) then we always pretend its unsigned when printing the character. If you do 'hex', all ints become unsigned. If you print 'decimal', all types because signed, so there is an 'unsigned decimal' format that works around this:

(lldb) frame var --format default f
(F) f = (sc1 = '\x01', uc1 = '\x01', sc2 = '\x01', uc2 = '\x01')
(lldb) frame var --format 'decimal' f
(F) f = (sc1 = -1, uc1 = -1, sc2 = 1, uc2 = 1)
(lldb) frame var --format 'unsigned decimal' f
(F) f = (sc1 = 1, uc1 = 1, sc2 = 1, uc2 = 1)
(lldb) frame var --format 'hex' f
(F) f = (sc1 = 0x01, uc1 = 0x01, sc2 = 0x01, uc2 = 0x01)
teemperor requested changes to this revision.Dec 17 2020, 6:57 AM

I think the proper fix here is to warn about BOOL bitfields with size 1 if BOOL is a typedef for signed char. I can make a Clang warning for that. I guess on the LLDB side we should make it clear that BOOL here has an invalid value (and then print the int value as we currently do)?

This revision now requires changes to proceed.Dec 17 2020, 6:57 AM

I just realized that I got the char sign also mixed up. It's an unsigned char in LLDB, so the 255 is wrong (But I believe our char signedness handling is not correct in any C language). But the fact that we don't print YES/NO for that bitfield is right (just the int value is wrong).

After chatting with Raphael offline, we realize that BOOL will indeed always be signed. I was under the impression for some reason that it was actually unsigned. So the fix is simpler is that we just need to make the formatter use GetValueAsSigned(...)

shafik updated this revision to Diff 312636.Dec 17 2020, 4:05 PM

Updating diff to reflect comments.

teemperor requested changes to this revision.Dec 18 2020, 6:35 AM

Some comments about that this still prints 255, but otherwise this is looking good.

(For the others: We agreed offline that adding the missing type checking for BOOL is out of scope for this patch. So the 'fail_value = 0` thing and so on is fine for this revision)

lldb/source/Plugins/Language/ObjC/Cocoa.cpp
1038

That still makes the value unsigned and prints 255 instead of -1. Something like: int64_t value = real_guy_sp->GetValueAsSigned(0); should do the trick.

lldb/test/API/functionalities/data-formatter/boolreference/TestFormattersBoolRefPtr.py
79

You can do this with expect_expr which makes this less fragile (the ValueCheck stuff is backported to the last stable branch, so that shouldn't complicate backporting):

# converted YES value after passing through the BOOL : 1 bitfield.
converted_yes = "-1"
# BOOL is bool instead of signed char on ARM.
if isArm:
    converted_yes = "1"
self.expect_expr('myField', result_type="BoolBitFields",
                 result_children=[
                     ValueCheck(name="fieldOne", summary="NO"),
                     ValueCheck(name="fieldTwo", summary= converted_yes,
                     ValueCheck(name="fieldThree", summary="NO"),
                     ValueCheck(name="fieldFour", summary="NO"),
                     ValueCheck(name="fieldfive", summary= converted_yes)
                 ])

Also added the check for when BOOL = bool when this is on ARM (which seems to be how Clang decides if BOOL is bool or signed char). I guess a better check would be to check the underlying type for BOOL and see what it actually is, but let's fix this properly when we add type checking to the BOOL summary provider.

lldb/test/API/functionalities/data-formatter/boolreference/main.mm
8

Nit: fieldfive instead of fieldFive.

lldb/test/API/lang/objc/bitfield_ivars/TestBitfieldIvars.py
27

Same as above with expect_expr.

This revision now requires changes to proceed.Dec 18 2020, 6:35 AM
shafik marked 4 inline comments as done.Dec 18 2020, 4:54 PM
shafik added inline comments.
lldb/source/Plugins/Language/ObjC/Cocoa.cpp
1038

Good catch!

lldb/test/API/functionalities/data-formatter/boolreference/TestFormattersBoolRefPtr.py
79

bool will be signed so it will be the same result, I tested on device to confirm.

shafik updated this revision to Diff 312904.Dec 18 2020, 4:55 PM
shafik marked 2 inline comments as done.
  • Fixed formatters::ObjCBOOLSummaryProvider to use int8_t and fixed the Printf
  • Updated tests
teemperor accepted this revision.Jan 8 2021, 4:47 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jan 8 2021, 4:47 AM
shafik closed this revision.Jan 25 2021, 11:50 AM