Page MenuHomePhabricator

[DebugInfo] Emit ObjC methods as part of interface.
ClosedPublic

Authored by JDevlieghere on Jun 15 2018, 3:11 PM.

Details

Summary

As brought up during the discussion of the DWARF5 accelerator tables,
there is currently no way to associate Objective-C methods with the
interface they belong to, other than the .apple_objc accelerator table.

After due consideration we came to the conclusion that it makes more
sense to follow Pavel's suggestion of just emitting this information in
the .debug_info section. One concern was that categories were
emitted in the .apple_names as well, but it turns out that LLDB doesn't
rely on the accelerator tables for this information.

This patch changes the codegen behavior to emit subprograms for
structure types, like we do for C++. This will result in the
DW_TAG_subprogram being nested as a child under its
DW_TAG_structure_type. This behavior is only enabled for DWARF5 and
later, so we can have a unique code path in LLDB with regards to
obtaining the class methods.

For more background please refer to the discussion on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/123986.html

Diff Detail

Repository
rC Clang

Event Timeline

JDevlieghere created this revision.Jun 15 2018, 3:11 PM

Not sure I should get too involved in this discussion, knowing so little about Objective C - but if it seems sensible, could you provide some examples (just in comments/description in this code review) of what the DWARF used to look like and what it looks like after this change?

Does this address the discoverability that's being discussed in the llvm-dev thread? There were concerns there about having to search through all the instances of a type to find all of its functions - I imagine, since Objective C classes aren't closed (if I understand correctly) that would still be a concern here? If it is, what is the benefit/tradeoff of this change (if it doesn't address the discoverability/still requires the Objective C accelerator table)?

Thanks! Is there also a companion patch for LLVM that disables the objc accelerator table?

Note that this is not a 100% replacement of the apple_objc accelerator table, since the apple_objc table also lists all methods defined in categories of that interface. Is the idea to also add category methods into the interface's DW_TAG_struture_type?

clang/lib/CodeGen/CGDebugInfo.cpp
3358 ↗(On Diff #151568)

// Starting with DWARF V5 method declarations are emitted as children of the interface type.

4247 ↗(On Diff #151568)

EltTys.append(RealDecl->getElements().begin(), RealDecl->getElements().end())

clang/lib/CodeGen/CGDebugInfo.h
105 ↗(On Diff #151568)

This constructor is probably not necessary if you construct the struct as { MD, Decl }?

111 ↗(On Diff #151568)

Isn't the interface already the key in the DenseMap?

clang/test/CodeGenObjC/debug-info-synthesis.m
35 ↗(On Diff #151568)

We should also check that this does not happen in DWARF 4.

JDevlieghere marked 5 inline comments as done.

CR feedback

could you provide some examples (just in comments/description in this code review) of what the DWARF used to look like and what it looks like after this change?

Thus far an Objective-C interface is a DW_TAG_structure_type that only has variables and properties as members. Objective-C methods are emitted as freestanding DW_TAG_subprogram function definitions. In addition, the .apple_objc accelerator table provides a mapping from ClassName -> [list of subprogram DIEs for each method in Class or one of its categories].

The idea behind this patch is to get rid of the accelerator table and encode the same information in the DIE hierarchy instead. LLDB was never using the accelerator table to accelerate lookup but only when it was in the middle of parsing a DW_TAG_structure_type of an Objective-C interface, so this shouldn't even come add a performance penalty.

Not sure I should get too involved in this discussion, knowing so little about Objective C - but if it seems sensible, could you provide some examples (just in comments/description in this code review) of what the DWARF used to look like and what it looks like after this change?

After this change we emit a forward declaration as a child of the DW_TAG_structure_type. The definition (below) remains unchanged.

0x00000027:   DW_TAG_structure_type
                DW_AT_APPLE_objc_complete_type  (true)
                DW_AT_name      ()
                DW_AT_byte_size (0x04)
                DW_AT_decl_file ("cat.m")
                DW_AT_decl_line (1)
                DW_AT_APPLE_runtime_class       (0x10)

0x0000002d:     DW_TAG_member
                  DW_AT_name    ()
                  DW_AT_type    (0x0000007b "base ")
                  DW_AT_decl_file       ("cat.m")
                  DW_AT_decl_line       (2)
                  DW_AT_data_member_location    (0x00)
                  DW_AT_accessibility   (DW_ACCESS_protected)

0x00000037:     DW_TAG_subprogram
                  DW_AT_name    ()
                  DW_AT_decl_file       ("cat.m")
                  DW_AT_decl_line       (10)
                  DW_AT_prototyped      (true)
                  DW_AT_type    (0x0000007b "base ")
                  DW_AT_declaration     (true)

0x0000003f:       DW_TAG_formal_parameter
                    DW_AT_type  (0x0000007f "structure *")
                    DW_AT_artificial    (true)

0x00000044:       DW_TAG_formal_parameter
                    DW_AT_type  (0x00000084 "structure *")
                    DW_AT_artificial    (true)

0x00000049:       NULL
...
0x0000007a:     NULL
...
0x000000b5:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x000000000000001c)
                DW_AT_frame_base        (DW_OP_reg6 RBP)
                DW_AT_object_pointer    (0x000000cf)
                DW_AT_name      ()
                DW_AT_decl_file ("cat.m")
                DW_AT_decl_line (10)
                DW_AT_prototyped        (true)
                DW_AT_type      (0x0000007b "base ")

0x000000cf:     DW_TAG_formal_parameter
                  DW_AT_location        (DW_OP_fbreg -8)
                  DW_AT_name    ()
                  DW_AT_type    (0x000000b0 "structure *")
                  DW_AT_artificial      (true)

0x000000d8:     DW_TAG_formal_parameter
                  DW_AT_location        (DW_OP_fbreg -16)
                  DW_AT_name    ()
                  DW_AT_type    (0x00000195 "structure *")
                  DW_AT_artificial      (true)

0x000000e1:     NULL

Does this address the discoverability that's being discussed in the llvm-dev thread? There were concerns there about having to search through all the instances of a type to find all of its functions - I imagine, since Objective C classes aren't closed (if I understand correctly) that would still be a concern here? If it is, what is the benefit/tradeoff of this change (if it doesn't address the discoverability/still requires the Objective C accelerator table)?

Following this approach we have the exact same problem (and I believe we could use the same solution). The motivation for this change is that there's no clean way to implement the .apple_objc accelerator table using the debug_names, because it's conceptually different from the other tables. It maps an interface name to the DIEs of its methods, as opposed to all the other tables that map a name to their corresponding DIE. The only way this could work is as a separate table, because otherwise you'd have to ensure, for every entry, if it's a "regular" one, which obviously violates the standard. Putting it in a separate column is also a bad idea, because that means the column is present for all the entries, including the ones that don't need it.

clang/lib/CodeGen/CGDebugInfo.h
111 ↗(On Diff #151568)

That's the ObjCInterfaceDecl while this is the DICompositeType.

  • Use type cache &
  • Simplify method cache struct
  • Add custom test that verifies category methods are emitted
aprantl added inline comments.Jun 15 2018, 5:22 PM
clang/lib/CodeGen/CGDebugInfo.cpp
4250 ↗(On Diff #151591)
auto &Elements = InterfaceDecl->getElements();
EltTys.append(Elements.begin(), Elements.end());
for (auto &M : p.second)
   EltTys.push_back(M.DIMethodDecl);
clang/lib/CodeGen/CGDebugInfo.h
106 ↗(On Diff #151591)

looks like that comment is cut off at the end?

clang/test/CodeGenObjC/debug-info-category.m
41 ↗(On Diff #151591)

Shouldn't you also check the scope: of the DISubprograms?

JDevlieghere marked 3 inline comments as done.
  • Feedback Adrian

I also don't want to get involved too much, but here are my 2c: :)

Putting it in a separate column is also a bad idea, because that means the column is present for all the entries, including the ones that don't need it.

This is not true. (Unlike the .apple_*** tables, ) .debug_names allows you to specify a different schema for every entry in the acelerator table. The schema is specifing using abbreviations in much the same way as .debug_abbrev specifies the schema for .debug_info. So you could easily have one abbreviation for regular classes, and a different one for objc interfaces. Currently, our .debug_names generator does not support these kinds of heterogeneous tables, but that's simply because I had no use for it. It could be added if necessary (though it will require some generalization/refactoring). OTOH, our consumer should already be able to handle these kinds of tables as input.

That said, I like this approach more than the putting that information in the accel table. :)

clang/test/CodeGenObjC/debug-info-category.m
35–37 ↗(On Diff #151596)

Would it make sense to remove the interface part from the method name. When these were freestanding, they were necessary as they were the only link between the method and the interface, but now they are kind of redundant.

This is just an idea, I have no idea what kinds of havoc it will cause for existing consumers, but it seems to me it would make the objc methods more consistent with regular c++ methods.

This is not true. (Unlike the .apple_*** tables, ) .debug_names allows you to specify a different schema for every entry in the acelerator table. The schema is specifing using abbreviations in much the same way as .debug_abbrev specifies the schema for .debug_info. So you could easily have one abbreviation for regular classes, and a different one for objc interfaces. Currently, our .debug_names generator does not support these kinds of heterogeneous tables, but that's simply because I had no use for it. It could be added if necessary (though it will require some generalization/refactoring). OTOH, our consumer should already be able to handle these kinds of tables as input.

Indeed, you are correct, I was thinking about the Apple structure with the atoms and forgot about the abbreviation in each entries for DWARF. Thanks for pointing this out!

I tested this patch on the LLDB test suite and all tests passed.

What I did was:

  • I removed the DWARF version check in clang so this was always generated.
  • I commented out the code that reads the .apple_objc accelerator tables in DWARFASTParserClang.cpp (which as far as I can tell is the only consumer). I expected to have to add logic to read the methods but the code already takes care of that, and just had a special if-clause for Objective-C classes. This is actually quite nice, as we don't need a code change to make both things working next to each other.
  • Update diff to version I used for testing (modulo the removed DWARF version check)
  • Re-add test case (forgot to stage it for last patch)
aprantl added inline comments.Jun 26 2018, 8:59 AM
lib/CodeGen/CGDebugInfo.cpp
4246

What does it mean that a type is not being found in the cache here?

lib/CodeGen/CGDebugInfo.h
102

The field naming scheme in this struct is not very consistent :-)

test/CodeGenObjC/debug-info-category.m
46

perhaps also add a class method and a category to the test case?

dexonsmith added inline comments.Jun 26 2018, 9:44 AM
lib/CodeGen/CGDebugInfo.cpp
3355

LLVM style rules suggest UpperCamelCase or INITIALISM rather than "it" here:
https://www.llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

4239

Some comment for "p" here.

4244

And "it" here.

JDevlieghere marked 6 inline comments as done.
  • Feedback Adrian & Duncan
lib/CodeGen/CGDebugInfo.cpp
4239

Fixed; I'll update the rest of the file in a follow-up to keep things consistent.

4246

Replaced it with an assert.

aprantl added inline comments.Jun 26 2018, 10:46 AM
lib/CodeGen/CGDebugInfo.cpp
4239

Unfortunately CFE doesn't seem to consistently use the LLVM coding style: A lot of CFE developers seem to prefer lower-case variable names. I don't know if that is a historic artifact or a conscious decision.

dblaikie added inline comments.Jun 26 2018, 11:59 AM
lib/CodeGen/CGDebugInfo.cpp
4239

Probably safe to assume it's historic - I don't think there's any intentional diversion from the LLVM style guide, maybe just lazy/accidental "being consistent" with the surrounding code & no one's bothered to do bigger cleanups.

aprantl accepted this revision.Jun 27 2018, 9:48 AM
This revision is now accepted and ready to land.Jun 27 2018, 9:48 AM
This revision was automatically updated to reflect the committed changes.