This is an archive of the discontinued LLVM Phabricator instance.

Fix how we handle bit-fields for Objective-C when creating an AST
ClosedPublic

Authored by shafik on Jul 8 2020, 3:03 PM.

Details

Summary

Currently expressions dealing with bit-fields in Objective-C objects is pretty broken. When generating debug-info for Objective-C bit-fields DW_AT_data_bit_offset has a different meaning than it does to C and C++.

When we parse the DWARF we validate bit offsets for C and C++ correctly but not for ObjC. For ObjC in some cases we end up incorrectly flagging an error and we don't generate further bit-fields in the AST.

Later on when we do a name lookup we don't find the ObjCIvarDecl in the ObjCInterfaceDecl in some cases since we never added it and then we don't go to the runtime to obtain the offset.

This will fix how we handle bit-fields for the Objective-C case and add tests to verify this fix but also to documents areas that still don't work and will be addressed in follow-up PRs.

Note: we can never correctly calculate offsets statically because of how Objective-C deals with the fragile base class issue. Which means the runtime may need to shift fields over.

Diff Detail

Event Timeline

shafik created this revision.Jul 8 2020, 3:03 PM
aprantl added inline comments.Jul 8 2020, 3:17 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
2529–2530

I think we should add a comment here explaining why this doesn't apply to Objective-C. Also, what about a C struct in Objective-C, or a C++ class in Objective-C++?

shafik marked an inline comment as done.Jul 9 2020, 9:53 AM
shafik added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
2529–2530

This is a great point! I think this check will probably be better:

!TypeSystemClang::IsObjCObjectOrInterfaceType(class_clang_type)

Let me try it out.

aprantl added inline comments.Jul 9 2020, 11:00 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
2529–2530

That sounds right to me.

I think I'm really out of the loop how this is supposed to work. From what I understand in the description it says we (=Clang when emitting DWARF?) is not generating offsets, but my DWARF dump of the test clearly has offsets (but they are clearly wrong).

0x00000033:   DW_TAG_structure_type
                DW_AT_APPLE_objc_complete_type	(true)
                DW_AT_name	("HasBitfield2")
                DW_AT_byte_size	(0x10)
                DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                DW_AT_decl_line	(3)
                DW_AT_APPLE_runtime_class	(DW_LANG_ObjC)

0x0000003c:     DW_TAG_inheritance
                  DW_AT_type	(0x0000007a "NSObject")
                  DW_AT_data_member_location	(0x00)

0x00000042:     DW_TAG_member
                  DW_AT_name	("x")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(5)
                  DW_AT_data_member_location	(0x00)
                  DW_AT_accessibility	(DW_ACCESS_public)

0x0000004f:     DW_TAG_member
                  DW_AT_name	("field1")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(7)
                  DW_AT_bit_size	(0x0f)
                  DW_AT_data_bit_offset	(0x00)
                  DW_AT_accessibility	(DW_ACCESS_public)

0x0000005d:     DW_TAG_member
                  DW_AT_name	("field2")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(8)
                  DW_AT_bit_size	(0x04)
                  DW_AT_data_bit_offset	(0x07)
                  DW_AT_accessibility	(DW_ACCESS_public)

0x0000006b:     DW_TAG_member
                  DW_AT_name	("field3")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(9)
                  DW_AT_bit_size	(0x04)
                  DW_AT_data_bit_offset	(0x03)
                  DW_AT_accessibility	(DW_ACCESS_public)

And when I change the bitfields to the sizes 2/4/4 then I actually get kinda right looking offsets in DWARF (ignoring the fact that they miss the 32-bit offset because of the first member) and the whole test seems to work fine from what I can see?

0x0000004f:     DW_TAG_member
                  DW_AT_name	("field1")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(7)
                  DW_AT_bit_size	(0x02)
                  DW_AT_data_bit_offset	(0x00)
                  DW_AT_accessibility	(DW_ACCESS_public)

0x0000005d:     DW_TAG_member
                  DW_AT_name	("field2")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(8)
                  DW_AT_bit_size	(0x04)
                  DW_AT_data_bit_offset	(0x02)
                  DW_AT_accessibility	(DW_ACCESS_public)

0x0000006b:     DW_TAG_member
                  DW_AT_name	("field3")
                  DW_AT_type	(0x000000a6 "unsigned int")
                  DW_AT_decl_file	("/Users/teemperor/test/bit.m")
                  DW_AT_decl_line	(9)
                  DW_AT_bit_size	(0x04)
                  DW_AT_data_bit_offset	(0x06)
                  DW_AT_accessibility	(DW_ACCESS_public)

So, if I understand correctly with this patch we add the bitfields with the wrong values (as they get fixed in the runtime later). Should we even do any bitfield offset checking for them at all if it doesn't matter? There is also some Obj-C unnamed bitfield support code below that seems to use those values, so ignoring the fact that the values are bogus seems a bit risk...

shafik added a comment.Jul 9 2020, 1:03 PM

We can never know the offsets statically b/c of how Objective-C deals with the fragile base class problem the runtime may have to shift fields over and we can not calculate this statically.

shafik edited the summary of this revision. (Show Details)Jul 9 2020, 1:04 PM

@shafik It would be good to document what the expected behavior for bitfields ivars in Objective-C is. For the sake of this discussion, but really we also want this in a comment in the code, for future reference.

IIUC, ivars don't have meaningful static offsets that we could put in DWARF, because only the Objective-C runtime knows the real offsets at runtime. Is the correct? How does that extend for bitfields? Is the Objective-C runtime aware of bitfield ivars, or are bitfields grouped together inter larger "container" ivars that the ObjC runtime knows about, and we need to encode the relative bit-offsets inside that container in DWARF?

shafik edited the summary of this revision. (Show Details)Jul 9 2020, 1:08 PM
teemperor accepted this revision.Jul 9 2020, 1:39 PM

Yeah I got confused by the description.

So, do we need to do any work at all with the bitfields offsets? With this patch we just keep adding them with whatever bogus bit-offset we get from DWARF and we just disable the sanity check. If the values are anyway useless, we might as well skip all the work and just add the field to the record? Also there is still the question why the calculations below are based on Obj-C bitfield offsets, which means that code is also calculating some bogus values? And then there is the question why we even have any offsets for this in DWARF when they are wrong. I guess there is a bunch of stuff that needs to be fixed here but this patch makes the code at least better working than before, so it probably should go in.

I would say we just early-exit for Obj-C bitfields from that code or something like that, but that can all be done in a follow-up PR. FWIW, this patch seems mostly ready to go beside Adrian's suggestions and one minor change in the test, so LGTM module comments.

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
2529–2530

FWIW, I think the original check was also correct. The class_language is from what I remember either Unknown or ObjC (when IsObjCObjectOrInterfaceType is true). But !TypeSystemClang::IsObjCObjectOrInterfaceType(class_clang_type) seems more expressive.

lldb/test/API/lang/objc/bitfield_ivars/TestBitfieldIvars.py
10

(the skipUnlessDarwin slipped out of this method and the one below)

lldb/test/API/lang/objc/bitfield_ivars/main.m
39

Thanks for modernizing this!

This revision is now accepted and ready to land.Jul 9 2020, 1:39 PM

IIRC from when Shafik and I were looking at this the ObjC runtime data lists the same offset for all the members of the bitfield: the offset of the first member of the bitfield. Apparently the ObjC runtime doesn't need to know where the individual members are dynamically - the offsets are baked into the code or something? So to find where the member actually is you have to take the offset into the field - which is given in the DWARF, and add it to the dynamic offset you get from the runtime for the first member.

shafik added a comment.Jul 9 2020, 2:38 PM

@aprantl I think this Objective-C Runtime Programming Guide: Bye Encodings entry and this sample program answer the rest of your questions:

#include <Foundation/Foundation.h>
#include <objc/runtime.h>
@interface Bits: NSObject {
  int a: 1;
  int b: 2;
  int c: 3;
  double x;
  int f: 23;
  int e: 4;
}
@end
@implementation Bits @end
int main() {
  Ivar *ivars = class_copyIvarList([Bits class], NULL);
  for (Ivar *c = ivars; *c; c++) {
    printf("%s %s %ld\n", ivar_getName(*c), ivar_getTypeEncoding(*c), ivar_getOffset(*c));
  }
  free(ivars);
}

with the following output:

a b1 8
b b2 8
c b3 8
x d 16
f b23 24
e b4 26

a b1 8

a is a bit-field of size 1 with offset 8

So for the test case that this fixes I get:

x i 8
f1 b15 12 (DWARF 0x0)
f2 b4 13 (DWARF 0x7)
f3 b4 14 (DWARF 0x3)

For the 2/4/4 bitfield case I get:

x i 8
f1 b2 12 (DWARF 0x0)
f2 b4 12 (DWARF 0x2)
f3 b4 12 (DWARF 0x6)

Anyway, Adrian's and Jim's point seem right. DWARF contains the bit offset from the start of the 'container' (it seems ObjC just groups bitfields together by some algorithm). If I look at the code for printing bitfields it seems like we expect to get the *bit* offset from the layout we store and we get the *byte* offset from the runtime. And then we combine those two into the actual offset we have to read. The values we get from the example above seem to match that and get the expected result, so I think that's right.

Regarding the second test: Running the test (You need to add the test_ prefix first) I get a pretty long warning that we try to get the size of the interface without an execution context:

warning: trying to determine the size of type @interface HasBitfield2 : NSObject{
    unsigned int x;
    unsigned int field1;
    unsigned int field2;
    unsigned int field3;
}
@end
without a valid ExecutionContext. this is not reliable. please file a bug against LLDB.
backtrace:
0  liblldb.11.0.0git.dylib 0x00000001070146b5 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37
1  liblldb.11.0.0git.dylib 0x0000000106f16a1b lldb_private::TypeSystemClang::GetBitSize(void*, lldb_private::ExecutionContextScope*) + 827
2  liblldb.11.0.0git.dylib 0x0000000106ba0421 lldb_private::CompilerType::GetByteSize(lldb_private::ExecutionContextScope*) const + 33
3  liblldb.11.0.0git.dylib 0x00000001079bcd68 IRForTarget::CreateResultVariable(llvm::Function&) + 2760
4  liblldb.11.0.0git.dylib 0x00000001079c30d0 IRForTarget::runOnModule(llvm::Module&) + 928
5  liblldb.11.0.0git.dylib 0x00000001079a15a4 lldb_private::ClangExpressionParser::PrepareForExecution(unsigned long long&, unsigned long long&, std::__1::shared_ptr<lldb_private::IRExecutionUnit>&, lldb_private::ExecutionContext&, bool&, lldb_private::ExecutionPolicy) + 1572
6  liblldb.11.0.0git.dylib 0x00000001079b723a lldb_private::ClangUserExpression::Parse(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, lldb_private::ExecutionPolicy, bool, bool) + 1002
7  liblldb.11.0.0git.dylib 0x0000000106b31e77 lldb_private::UserExpression::Evaluate(lldb_private::ExecutionContext&, lldb_private::EvaluateExpressionOptions const&, llvm::StringRef, llvm::StringRef, std::__1::shared_ptr<lldb_private::ValueObject>&, lldb_private::Status&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, lldb_private::ValueObject*) + 2375
8  liblldb.11.0.0git.dylib 0x0000000106c358e5 lldb_private::Target::EvaluateExpression(llvm::StringRef, lldb_private::ExecutionContextScope*, std::__1::shared_ptr<lldb_private::ValueObject>&, lldb_private::EvaluateExpressionOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, lldb_private::ValueObject*) + 885

In general the semantics here seem kinda off. The result of the expression is HasBitfield2 which I don't think is something we really have in Obj-C as it seems like a stack-allocated HasBitfields2 instance. Changing the expression to return HasBitfield2 * instead and then forcing that we dereference Obj-C pointers fixes the second case for me. (FWIW, doing frame var *hb2 is also working and is doing essentially the same thing as it just prints the children of the pointer). Fixing that can just be a follow up PR.

So I would say removing the sanity check here seems fine. We maybe want to have a specific Obj-C sanity check but I'm not even sure how this would look like. Maybe figuring out what's the maximum 'container size' we can have in Obj-C and checking that offset + size is below that makes sense, but otherwise I don't think we can do a lot. So I think my previous comment that this is ready to go beside some minor changes is IMHO the way to go.

lldb/test/API/lang/objc/bitfield_ivars/TestBitfieldIvars.py
24

This method isn't a test at the moment as it doesn't have a test prefix in the name. E.g. test_ExpressionOnObject is the way to make this a test that actually runs. Currently it's just a normal method that is expected to be run by other test_ methods.

(Also the commit description should be updated again).

@aprantl I think this Objective-C Runtime Programming Guide: Bye Encodings entry and this sample program answer the rest of your questions:

Thank you! So it really looks like the ObjC runtime is unaware of bitfields, and we do need the bit offsets in DWARF to know where the individual bitfields are located in the respective container. Please make sure to document this in the code.

shafik edited the summary of this revision. (Show Details)Jul 14 2020, 4:57 PM
shafik updated this revision to Diff 278026.Jul 14 2020, 5:05 PM
shafik marked 6 inline comments as done.

Added clarifying comment.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2020, 9:18 PM