This is an archive of the discontinued LLVM Phabricator instance.

Implement __attribute__((objc_direct)), __attribute__((objc_direct_members))
ClosedPublic

Authored by MadCoder on Nov 7 2019, 11:26 PM.

Details

Summary

attribute((objc_direct)) is an attribute on methods declaration, and
attribute((objc_direct_members)) on implementation, categories or
extensions.

A direct property specifier is added (@property(direct) type name)

These attributes / specifiers cause the method to have no associated
Objective-C metadata (for the property or the method itself), and the
calling convention to be a direct C function call.

The symbol for the method has enforced hidden visibility and such direct
calls are hence unreachable cross image. An explicit C function must be
made if so desired to wrap them.

The implicit self and _cmd arguments are preserved, however to
maintain compatibility with the usual objc_msgSend semantics,
3 fundamental precautions are taken:

  1. for instance methods, self is nil-checked. On arm64 backends this typically adds a single instruction (cbz x0, <closest-ret>) to the codegen, for the vast majority of the cases when the return type is a scalar.
  1. for class methods, because the class may not be realized/initialized yet, a call to [self self] is emitted. When the proper deployment target is used, this is optimized to objc_opt_self(self).

    However, long term we might want to emit something better that the optimizer can reason about. When inlining kicks in, these calls aren't optimized away as the optimizer has no idea that a single call is really necessary.
  1. the calling convention for the _cmd argument is changed: the caller leaves the second argument to the call undefined, and the selector is loaded inside the body when it's referenced only.

As far as error reporting goes, the compiler refuses:

  • making any overloads direct,
  • making an overload of a direct method,
  • implementations marked as direct when the declaration in the interface isn't (the other way around is allowed, as the direct attribute is inherited from the declaration),
  • marking methods required for protocol conformance as direct,
  • messaging an unqualified id with a direct method,
  • forming any @selector() expression with only direct selectors.

As warnings:

  • any inconsistency of direct-related calling convention when @selector() or messaging is used,
  • forming any @selector() expression with a possibly direct selector.

Lastly an objc_direct_members attribute is added that can decorate
@implementation blocks and causes methods only declared there (and in
no @interface) to be automatically direct. When decorating an
@interface then all methods and properties declared in this block are
marked direct.

Diff Detail

Event Timeline

MadCoder created this revision.Nov 7 2019, 11:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2019, 11:26 PM
MadCoder edited the summary of this revision. (Show Details)Nov 7 2019, 11:28 PM
MadCoder marked 2 inline comments as done.Nov 7 2019, 11:36 PM
MadCoder added inline comments.
clang/lib/CodeGen/CGObjCMac.cpp
1588

doh I'll fix the tabs

4075

I suspect this function requires synthesizing some debug metadata but I have absolutely no idea how and what ;)

Why doesn't this need an update in lib/Serialization, is there generic code that handles all attributes?

Why doesn't this need an update in lib/Serialization, is there generic code that handles all attributes?

I followed the patch made for NonLazyClass and it didn't touch lib/Serialization and None of the attribute names I know hit in there either.

mattd added a subscriber: mattd.Nov 8 2019, 5:01 PM

Why doesn't this need an update in lib/Serialization, is there generic code that handles all attributes?

The serialization logic for attributes is tblgen'ed from Attr.td.

rjmccall added inline comments.Nov 12 2019, 7:50 PM
clang/include/clang/Basic/AttrDocs.td
3905

Editing error?

3909

If this is still true (rather than being done with objc_direct_members),
I feel it should be documented next to the statement at the end about
objc_direct_members in order to make the contrasting use cases clear.
But we should discuss whether this should be using objc_direct_members
for that use case.

3912

"If an Objective-C property is declared with the `direct` property attribute, then its getter and setter are declared to be direct unless they are explicitly declared."

I assume that's correct w.r.t the treatment of explicit declarations of getters and setters?

3918

A message send to a direct method calls the implementation directly, as if it were a C function, rather than using ordinary Objective-C method dispatch. This is substantially faster and potentially allows the implementation to be inlined, but it also means the method cannot be overridden in subclasses or replaced dynamically, as ordinary Objective-C methods can.

Furthermore, a direct method is not listed in the class's method lists. This substantially reduces the code-size overhead of the method but also means it cannot be called dynamically using ordinary Objective-C method dispatch at all; in particular, this means that it cannot override a superclass method or satisfy a protocol requirement.

Clang will diagnose attempts to override direct methods, override non-direct methods with direct methods, implement a protocol requirement with a direct method, or use a selector that is only known to be used for direct methods.

Symbols for direct method implementations are implicitly given hidden visibility, meaning that they can only be called within the same linkage unit.

Although a direct method is called directly as if it were a C function, it still obeys Objective-C semantics in other ways:

  • If the receiver is nil, the call does nothing and returns the zero value for the return type.
  • Calling a direct class method will cause the class to be initialized, including calling the +initialize method if present.
  • The method still receives an implicit _cmd argument containing its selector.
3938

"decorage"

I think you could be more precise about "has not been declared in any `@interface or @protocol`". It's specifically the interfaces, class extensions, categories, and protocols of this class and its superclasses.

This should refer the user to the documentation about the `objc_direct` attribute.

clang/include/clang/Basic/DiagnosticSemaKinds.td
994

"direct method implementation was previously declared not direct"

996

"methods that %select{override superclass methods|implement protocol requirements}0 cannot be direct"

998

"cannot override a method that is declared direct by a superclass"

clang/lib/CodeGen/CGObjC.cpp
689

Please add a TODO about the idea of emitting separate entry points that elide the check.

clang/lib/CodeGen/CGObjCGNU.cpp
3884

This doesn't seem to be diagnosed in Sema.

clang/lib/CodeGen/CGObjCMac.cpp
1588

The indentation still seems wrong.

2169

We generally try to write comments as proper sentences, or at least start with a capital letter. :)

2219

I'm not a fan of this method name, but it's following an existing pattern, so I'll cope.

4075

I'm sure Adrian has thoughts about that.

4076

When would the method container ever be an interface? This is running on a method implementation; it should always be within an ObjCImplDecl. (There was a case where this wasn't true for accessors, but Adrian just fixed that; regardless, you can definitely pass down an ObjCImplDecl.). From the ObjCImplDecl you can extract the class interface, which can never be null.

4093

Please just extract a little helper that does this try-generate-specialized-else-generate-normally dance out of EmitObjCMessageExpr.

4095

We should also be changing selfValue so that the null check below will do the right thing with e.g. subclasses of weak-linked classes, where IIUC the input value might be non-null but the initialized class pointer might be null.

4097

We should always be able to find *some* OID.

6719

I would generally prefer if (MD->isDirectMethod()) continue; for all of these loops.

MadCoder updated this revision to Diff 229015.Nov 12 2019, 11:16 PM
MadCoder marked 16 inline comments as done and an inline comment as not done.

Handled a bunch of comments (marked done).

will update again to implement the deeper requested change around objc_direct_members and some more unaddressed comments.

MadCoder marked an inline comment as not done.Nov 12 2019, 11:17 PM
MadCoder added inline comments.
clang/include/clang/Basic/AttrDocs.td
3909

objc_direct_members is for the @implementation only right now. objc_direct_members but it could make sense to be used on @interface instead I agree. It would make sense.

EDIT: actually I quite like this, it is much cleaner. I'll work on updating the patch.

3912

I would expect so, I shall have a test for it

clang/lib/CodeGen/CGObjCGNU.cpp
3884

how should I do it, is there an example I can follow?

clang/lib/CodeGen/CGObjCMac.cpp
1588

yeah I fixed it in my checkout but didn't update the diff here yet

4095

I'm not sure what you mean, wouldn't storing to CGF.GetAddrOfLocalVar(OMD->getSelfDecl()) actually affect selfValue itself? I'm not 100% sure I understand what you mean here.

MadCoder added inline comments.Nov 12 2019, 11:19 PM
clang/lib/CodeGen/CGObjCMac.cpp
2169

jeez will fix the indent here too

MadCoder updated this revision to Diff 229023.Nov 13 2019, 1:13 AM
MadCoder marked 8 inline comments as done.
MadCoder edited the summary of this revision. (Show Details)

Updated the patch to restrict objc_direct to methods and use objc_direct_members for containers, and several diagnostics improvements (especially in the vicinity of properties and the GNU runtime + tests).

Addressed all of @rjmccall comments except the one about selfValue that I don't understand yet and the lldb related stuff.

MadCoder updated this revision to Diff 229024.Nov 13 2019, 1:22 AM

Updated clang/test/Misc/pragma-attribute-supported-attributes-list.test that I had forgotten.

MadCoder marked an inline comment as done.Nov 13 2019, 1:24 AM
MadCoder added inline comments.
clang/lib/CodeGen/CGObjCGNU.cpp
3884

actually figured it out.

rjmccall added inline comments.Nov 13 2019, 8:06 AM
clang/lib/CodeGen/CGObjCMac.cpp
4095

selfValue is just an ordinary C++ local variable which is initialized at the top of the function to the IR value for the first parameter. It doesn't track the "dynamic" value stored in a particular IR alloca.

Oh, and that's a bug, actually; the first parameter of the IR function is not necessarily self because of indirect return slots. You should be initializing selfValue to the result of a load from the self variable. And please add tests for direct methods (both instance and class) with indirect returns.

MadCoder updated this revision to Diff 229183.Nov 13 2019, 2:03 PM
MadCoder marked 2 inline comments as done.

Beefed up the tests, addressed the selfValue related issue.

Also rolled back to the ObjCDeclInterfaceDecl cast in GenerateDirectMethod as it turns out that all callers pass this type rather than an ObjCImplDecl however it's never nil and always of that type so make this an assert instead.

Did a pass of clang-format to clean up style

MadCoder added inline comments.Nov 13 2019, 2:04 PM
clang/lib/CodeGen/CGObjCMac.cpp
4076

so it turns out the CD is always a classInterface because that's how things are called, so I will instead make it a cast<>

MadCoder marked an inline comment as done.Nov 13 2019, 2:04 PM

Is it intentional that the direct method names use the exact same symbol namespace (\01-[class message]) as "real" Objective-C methods? Could that be a problem? Should we use a slightly different naming scheme?

Could you add tests for the following cases:

  1. A direct property that names a getter that is non-direct and explicitly declared in the current interface.
  2. #1, but explicitly declared elsewhere, maybe in a super class or in the primary interface.
  3. #1 and #2, but where the property is only direct because of objc_direct_members.
  4. A direct property that is making a previously-readonly direct property readwrite.
  5. #4, but the previous property was not direct (only the setter should be direct).
  6. #4 and #5, but with objc_direct_members.
  7. A synthesized getter or setter in an objc_direct_members implementation (should follow the interface).
  8. #7 but with auto-synthesis
  9. The implicit .cxx_destruct and .cxx_construct methods in an objc_direct_members shouldn't get treated as direct. This should not be remotely possible in the current implementation, but it's worth adding a test to verify that it doesn't start to happen.
clang/include/clang/Basic/AttrDocs.td
3942

Okay, how's this as a tweak to the whole documentation:

The `objc_direct` attribute can be used to mark an Objective-C method as
being *direct*. A direct method is treated statically like an ordinary method,
but dynamically it behaves more like a C function. This lowers some of the costs
associated with the method but also sacrifices some of the ordinary capabilities
of Objective-C methods.

A message send of a direct method calls the implementation directly, as if it
were a C function, rather than using ordinary Objective-C method dispatch. This
is substantially faster and potentially allows the implementation to be inlined,
but it also means the method cannot be overridden in subclasses or replaced
dynamically, as ordinary Objective-C methods can.

Furthermore, a direct method is not listed in the class's method lists. This
substantially reduces the code-size overhead of the method but also means it
cannot be called dynamically using ordinary Objective-C method dispatch at all;
in particular, this means that it cannot override a superclass method or satisfy
a protocol requirement.

Although a message send of a direct method causes the method to be called
directly as if it were a C function, it still obeys Objective-C semantics in other
ways:

  • If the receiver is `nil`, the message send does nothing and returns the zero value for the return type.
  • A message send of a direct class method will cause the class to be initialized, including calling the `+initialize` method if present.
  • The implicit `_cmd` parameter containing the method's selector is still defined. In order to minimize code-size costs, the implementation will not emit a reference to the selector if the parameter is unused within the method.

Symbols for direct method implementations are implicitly given hidden
visibility, meaning that they can only be called within the same linkage unit.

It is an error to do any of the following:

  • declare a direct method in a protocol,
  • declare an override of a direct method with a method in a subclass,
  • declare an override of a non-direct method with a direct method in a subclass,
  • declare a method with different directness in different class interfaces, or
  • implement a non-direct method (as declared in any class interface) with a direct method.

If any of these rules would be violated if every method defined in an
`@implementation` within a single linkage unit were declared in an
appropriate class interface, the program is ill-formed with no diagnostic
required. If a violation of this rule is not diagnosed, behavior remains
well-defined; this paragraph is simply reserving the right to diagnose such
conflicts in the future, not to treat them as undefined behavior.

Additionally, Clang will warn about any `@selector` expression that
names a selector that is only known to be used for direct methods.

For the purpose of these rules, a "class interface" includes a class's primary
`@interface` block, its class extensions, its categories, its declared protocols,
and all the class interfaces of its superclasses.

An Objective-C property can be declared with the `direct` property
attribute. If a direct property declaration causes an implicit declaration of
a getter or setter method (that is, if the given method is not explicitly
declared elsewhere), the method is declared to be direct.

Some programmers may wish to make many methods direct at once. In order
to simplify this, the `objc_direct_members` attribute is provided; see its
documentation for more information.

3959

The `objc_direct_members` attribute can be placed on an Objective-C
`@interface or @implementation` to mark that methods declared
therein should be considered direct by default. See the documentation
for `objc_direct` for more information about direct methods.

When `objc_direct_members is placed on an @interface` block, every
method in the block is considered to be declared as direct. This includes any
implicit method declarations introduced by property declarations. If the method
redeclares a non-direct method, the declaration is ill-formed, exactly as if the
method was annotated with the `objc_direct attribute. objc_direct_members`
cannot be placed on the primary interface of a class, only on category or class
extension interfaces.

When `objc_direct_members is placed on an @implementation` block,
methods defined in the block are considered to be declared as direct unless
they have been previously declared as non-direct in any interface of the class.
This includes the implicit method definitions introduced by synthesized
properties, including auto-synthesized properties.

Is it intentional that the direct method names use the exact same symbol namespace (\01-[class message]) as "real" Objective-C methods? Could that be a problem? Should we use a slightly different naming scheme?

I don't have an answer here, real compiler folks should answer (ping @rjmccall ?).

Two statements:

  • I would desire for the backtraces in a crash report to still look similar so moving to _-[class message] (with the leading _) would be fine
  • the name can't quite collide because you can't define this symbol as both direct and dynamic, so it's effectively always free

I can however see why it may or may not confuse the debugger

The symbols probably won't confuse the debugger; after all, the only difference here is that they're actually external. Of course the debugger will need work to support direct methods in general, since they won't be in method tables.

MadCoder updated this revision to Diff 229238.Nov 13 2019, 11:24 PM
MadCoder marked 2 inline comments as done.

Implemented all the tests @rjmccall wanted (and then some)

MadCoder marked 3 inline comments as done.Nov 13 2019, 11:25 PM
MadCoder added inline comments.
clang/lib/CodeGen/CGObjCMac.cpp
4075

@aprantl said that the debug info looked right to him

MadCoder marked an inline comment as done.Nov 13 2019, 11:39 PM

Thanks, the tests look great.

clang/include/clang/Basic/AttrDocs.td
3919

Please add a new paragraph here:

Because a direct method cannot be overridden, it is an error to perform
a ``super`` message send of one.

And you should test that. (I noticed this because you had an assert(!IsSuper); in IRGen, which was both a good idea and also begging for a justification. :))

clang/lib/CodeGen/CGObjCMac.cpp
4081

I don't think the first half of this comment is necessary anymore.

The second half should be clearer; I'd suggest something like:

TODO: If this method is inlined, the caller might know that `self` is already initialized;
for example, it might be an ordinary Objective-C method which always receives an
initialized `self`, or it might have just forced initialization on its own.  We should
find a way to eliminate this unnecessary initialization in such cases in LLVM.
4087

The assumption here is that a direct class method can never be invoked on a nullable value, like a Class. I think that's true, but it's worth making that explicit in a comment.

4114

Please branch to the return block in both cases.

clang/test/CodeGenObjC/direct-method.m
25

The IR names of local values aren't emitted in release builds of the compiler, so you need to use FileCheck variables for these allocas the same way that you do for RETVAL and SELF.

dexonsmith added inline comments.Nov 14 2019, 10:02 AM
clang/test/CodeGenObjC/direct-method.m
25

That’s worth double-checking. I think that has been fixed, such that whether names are dropped is controlled by a cc1 flag (which the driver sets by default in release builds).

MadCoder updated this revision to Diff 229366.Nov 14 2019, 11:15 AM
MadCoder marked 7 inline comments as done.

Updated for the new round of comments, with added tests and Sema checks errors for:

  • messaging super
  • messaging a nullable Class expression
clang/include/clang/Basic/AttrDocs.td
3919

hah turns out I actually need to implement the Sema check for this :D

clang/lib/CodeGen/CGObjCMac.cpp
4087

Hmm actually I think that it's _not_ true. as in I'm not disallowing it today but I should be. I need to figure out how to do that, messaging a Class should be as disallowed as messaging id.

but right now it's not.

MadCoder marked an inline comment as done.Nov 14 2019, 11:15 AM
rjmccall added inline comments.Nov 14 2019, 12:33 PM
clang/lib/Sema/SemaExprObjC.cpp
1129

Is this a good idea to do for arbitrary diagnostics? I feel like saying "direct" in the note when that has nothing to do with the conflict could be a little misleading.

MadCoder updated this revision to Diff 229410.Nov 14 2019, 2:32 PM
MadCoder marked 2 inline comments as done.

reverted the hunk about "direct methods" note.

clang/lib/Sema/SemaExprObjC.cpp
1129

fair enough. some of the mismatches will be because of directness so it can help in that instance but I can see how it's confusing.

fixed, and the fact that tests didn't need any updating probably proves you right ;)

rjmccall accepted this revision.Nov 15 2019, 1:10 PM

Okay. This seems reasonable to me, then.

This revision is now accepted and ready to land.Nov 15 2019, 1:10 PM

when running the full test-suite before sending the patch, and it broke tests because some loads are now ordered differently :(
yay.

so I need to either make sure the CG is done in the same order again, or to fix the test expectations.

MadCoder updated this revision to Diff 229743.EditedNov 17 2019, 8:47 PM

Diff against previous is:

diff --git a/clang/lib/CodeGen/CGObjCMac.cpp b/clang/lib/CodeGen/CGObjCMac.cpp
index 814f67787dd..775e3406da7 100644
--- a/clang/lib/CodeGen/CGObjCMac.cpp
+++ b/clang/lib/CodeGen/CGObjCMac.cpp
@@ -2159,21 +2159,23 @@ CGObjCCommonMac::EmitMessageSend(CodeGen::CodeGenFunction &CGF,
                                  const ObjCInterfaceDecl *ClassReceiver,
                                  const ObjCCommonTypesHelper &ObjCTypes) {
   CodeGenTypes &Types = CGM.getTypes();
-  CallArgList ActualArgs;
-  if (!IsSuper)
-    Arg0 = CGF.Builder.CreateBitCast(Arg0, ObjCTypes.ObjectPtrTy);
-  ActualArgs.add(RValue::get(Arg0), Arg0Ty);
+  auto selTy = CGF.getContext().getObjCSelType();
+  llvm::Value *SelValue;
+
   if (Method && Method->isDirectMethod()) {
     // Direct methods will synthesize the proper `_cmd` internally,
     // so just don't bother with setting the `_cmd` argument.
     assert(!IsSuper);
-    auto selTy = CGF.getContext().getObjCSelType();
-    auto undefSel = llvm::UndefValue::get(Types.ConvertType(selTy));
-    ActualArgs.add(RValue::get(undefSel), selTy);
+    SelValue = llvm::UndefValue::get(Types.ConvertType(selTy));
   } else {
-    ActualArgs.add(RValue::get(GetSelector(CGF, Sel)),
-                   CGF.getContext().getObjCSelType());
+    SelValue = GetSelector(CGF, Sel);
   }
+
+  CallArgList ActualArgs;
+  if (!IsSuper)
+    Arg0 = CGF.Builder.CreateBitCast(Arg0, ObjCTypes.ObjectPtrTy);
+  ActualArgs.add(RValue::get(Arg0), Arg0Ty);
+  ActualArgs.add(RValue::get(SelValue), selTy);
   ActualArgs.addFrom(CallArgs);

   // If we're calling a method, use the formal signature.
@@ -7622,7 +7624,7 @@ Address CGObjCNonFragileABIMac::EmitSelectorAddr(CodeGenFunction &CGF,
       llvm::ConstantExpr::getBitCast(GetMethodVarName(Sel),
                                      ObjCTypes.SelectorPtrTy);
     std::string SectionName =
-        GetSectionName("__objc_selrefs", "literal_pointers");
+        GetSectionName("__objc_selrefs", "literal_pointers,no_dead_strip");
     Entry = new llvm::GlobalVariable(
         CGM.getModule(), ObjCTypes.SelectorPtrTy, false,
         getLinkageTypeForObjCMetadata(CGM, SectionName), Casted,

Basically, I emit the selref load (Arg1) before emiting Arg0 as it used to, to avoid having to change all the tests.
I also put the "no_dead_strip" on selector sections back, as it's a remnant from an old iteration and this change is not needed for this per se (if we mean to do it it should be its own change).

Okay. There's actually a good reason to emit the selector reference as close to the call as possible, but I agree that we shouldn't do that in this patch.

Sorry for getting to this late, I assumed it was a runtime-agnostic feature until someone filed a bug against the GNUstep runtime saying that we didn't implement it. It would be polite to tag me in reviews for features that contain runtime-specific things.

The current implementation doesn't look as if it's runtime-specific. Is there a reason that it's in CGObjCMacCommon rather than in the generic CGObjCRuntime class?

Looking at the code, it appears that it is correct only for objects created with +alloc. Consider the following example:

#import <Foundation/Foundation.h>
#include <objc/runtime.h>

@interface X : NSObject
- (void)test __attribute__((objc_direct));
@end

@implementation X
+ (void)initialize
{
	printf("+initialize called on %p\n", self);
}
- (void)test
{
	printf("Test called on %p\n", self);
}
@end

int main()
{
	@autoreleasepool
	{
		X *x = class_createInstance(objc_getClass("X"), 0);
		[x test];
	}
}

I don't believe this does the right thing (unfortunately, the latest XCode requires Catalina and it breaks too many things for me to upgrade from Mojave, so I can't test it and see if there are any tweaks to the runtime). The API guarantee for class_createInstance does not require it to call +initialize. I can see a few ways of fixing this:

Either:

  • Unconditionally do the [self self] dance on all direct methods (not ideal, since it's likely to offset the performance win) for runtimes without knowledge of direct functions and one out of:
    • Set a metadata flag on any class with direct methods to require class_createInstance (and any other allocation paths that I haven't thought of) in the runtime to call +initialize on instantiation.
    • Modify the API contract of class_createInstance to send +initialize unconditionally.
    • Provide an objc_msgSendDirect that takes the target function in place of the selector and checks that the receiver is initialise, teach the optimisers to inline through this in cases where they can prove that the receiver is already initialised.
  • Document that direct methods do not trigger +initialize, generalise that to class methods and remove the existing +self dance.

For direct class methods, avoiding the +self dance might best avoided by using a construct like the C++ static initialiser for a pointer to the class at the call site (rather than a direct reference to the class), which guarantees that it's initialised in the callee and skipping this when the receiver is self in an method (+initialize is guaranteed to have been called on self on entry to any method in this case). The pseudocode is something like (where +direct is a direct method):

The commit message for this refers to objc_opt_self. This is presumably a fast-path for a self message send, but the only reference to that anywhere in the LLVM tree is in LLDB. We don't appear to have any tests in clang for it and neither upstream nor Apple clang appear to generate calls for it, so I'm not sure what 'an appropriate deployment target' is in the commit message.