This is an archive of the discontinued LLVM Phabricator instance.

RFC: Implement objc_direct_protocol attribute to remove protocol metadata
ClosedPublic

Authored by lanza on Mar 3 2020, 2:48 PM.

Details

Summary

Motivated by the new objc_direct attribute, this change aims to limit
metadata generation from Protocols that the programmer knows isn't
going to be used at runtime. This attribute simply causes the frontend
to not generate any protocol metadata entries (e.g. OBJC_CLASS_NAME,
_OBJC_$_PROTOCOL_INSTANCE_METHDOS, _OBJC_PROTOCOL, etc) for a protocol
marked with __attribute__((objc_direct)).

There are a few APIs used to retrieve a protocol at runtime.
@protocol(SomeProtocol) will now error out of the requested protocol
is marked with attribute. objc_getProtocol will returl NULL which
is consistent with the behavior of a non-existing protocol.

Diff Detail

Event Timeline

lanza created this revision.Mar 3 2020, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 2:48 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
aaron.ballman added a subscriber: aaron.ballman.

Adding some more knowledgeable reviewers for comments on your RFC. I pointed out a few minor nits, but I'll hold off on a technical review until the ObjC-specific details are worked out and there is buy-in on the feature.

clang/include/clang/AST/DeclObjC.h
2174

Comments should be properly punctuated (missing a full stop at the end of the sentence), here and elsewhere.

clang/lib/Sema/SemaDeclAttr.cpp
2622

This should be implemented via a custom LangOpt in Attr.td

I'm fine with people developing a proposal for this openly, but it needs to be said that language changes cannot just be made in open-source; they have to go through the official language review process, which for Objective-C is an internal committee within Apple.

The summary calls this objc_direct_protocol, but it's objc_static_protocol in the patch. I agree that "direct" isn't a great name for this. static is complicated, because while we use "static" vs. "dynamic" this way when we're talking *about* languages, I'm not sure any of the umpteen language uses of static ever use it in exactly this way, and it's possibly quite confusing to add one. Throwing out other names here: objc_non_runtime_protocol? objc_compiler_only_protocol?

The technical content of the patch seems fine.

lanza marked an inline comment as done.Mar 7 2020, 2:51 PM

Adding some more knowledgeable reviewers for comments on your RFC. I pointed out a few minor nits, but I'll hold off on a technical review until the ObjC-specific details are worked out and there is buy-in on the feature.

Thanks!

This should be implemented via a custom LangOpt in Attr.td

This was done to more-or-less mirror the behavior from objc_direct. Though if people think the implementations should differ I'm cool with that.

I'm fine with people developing a proposal for this openly, but it needs to be said that language changes cannot just be made in open-source; they have to go through the official language review process, which for Objective-C is an internal committee within Apple.

Yea sure, whatever the correct process is I'm more than happy to cooperate. I was going to post something on the mailing list some time this week, but this diff got attention early! What would be the best way to start that conversation?

The summary calls this objc_direct_protocol, but it's objc_static_protocol in the patch. I agree that "direct" isn't a great name for this. static is complicated, because while we use "static" vs. "dynamic" this way when we're talking *about* languages, I'm not sure any of the umpteen language uses of static ever use it in exactly this way, and it's possibly quite confusing to add one. Throwing out other names here: objc_non_runtime_protocol? objc_compiler_only_protocol?

Yea, the naming I'm not too happy about and changed it around a few times. Your proposed naming is certainly more clear.

The technical content of the patch seems fine.

👍

clang/include/clang/AST/DeclObjC.h
2174

👍

This might also be interesting to @manmanren.

This might also be interesting to @manmanren.

Thank you, John!

Manman

lanza updated this revision to Diff 252706.Mar 25 2020, 4:35 PM

Rename and address some issues

lanza updated this revision to Diff 252707.Mar 25 2020, 4:38 PM

Reword commit

Harbormaster completed remote builds in B50459: Diff 252706.

If someone writes up a short proposal for this, with motivation and impact, we'd be happy to present it internally.

If someone writes up a short proposal for this, with motivation and impact, we'd be happy to present it internally.

I have a rough draft that I'll be touching up and submitting Monday most likely! Thanks, John!

lanza added a comment.Apr 30 2020, 5:08 PM

@rjmccall Hey John, I sent the proposal to the addresses I was pointed to but haven't heard back in multiple weeks. Any update on this?

Sorry, I should've been more straight with you. I'm still happy to raise this internally, but this is a hectic time of the year at Apple, and I'm not actually going to raise it until after WWDC. It wouldn't be making it into the new Xcode release anyway.

jdoerfert resigned from this revision.Jul 23 2020, 7:53 AM
lanza added a comment.Aug 3 2020, 2:12 PM

ping @rjmccall. Any update on a timeline for this review process? Thanks!

Sorry, this slipped out of my mind. I've started the process internally.

lanza added a comment.Aug 4 2020, 11:28 AM

No problem! Thank you, John!

One thing that's come up so far: you generally need to be looking through non-runtime protocols, not ignoring them. This matters when non-runtime protocols inherit from ordinary protocols. It may be useful to provide a generic function that walks an array of protocols and calls a callback with the unique ordinary protocols it implies.

You should also implement this for non-Apple runtimes, which should be straightforward with that generic function; it's just a matter of testing it. CC'ing David Chisnall.

This feature looks generally useful. A few small suggestions:

  • This is really a way of transforming a formal protocol into an informal protocol. Objective-C has had a convention of informal protocols since the '80s, but they're implemented as categories on the root class with no @implementation. I'd suggest that __attribute__((objc_informal_protocol)) or similar might be a better spelling for this, explicitly bringing the informal notion into the language. A lot of the informal protocols in Cocoa could be better expressed using this and @optional methods than as categories on NSObject.
  • Given that this doesn't depend on any features in the runtime (from the runtime's perspective, the protocol doesn't exist), I don't think it makes sense to have an ObjCRuntime method to query whether this is supported by the runtime. We should enable it everywhere if it's going in anywhere.
  • The changes required in CGObjcCGNU.cpp are fairly small and I agree that @rjmccall's proposal for a callback-driven visitor would simplify the changes in both runtimes.
  • The semantics are slightly confusing with the deep approach though. Normally, if you iterate over the protocols that a class conforms to, you only see the ones that it directly conforms to. With this model, you'd see indirect ones. We might want to set some metadata to allow programmers to differentiate the two, or we might want to have a warning (off by default?) if an informal protocol conforms to a formal one, or simply disallow it.

This feature looks generally useful. A few small suggestions:

  • This is really a way of transforming a formal protocol into an informal protocol. Objective-C has had a convention of informal protocols since the '80s, but they're implemented as categories on the root class with no @implementation. I'd suggest that __attribute__((objc_informal_protocol)) or similar might be a better spelling for this, explicitly bringing the informal notion into the language. A lot of the informal protocols in Cocoa could be better expressed using this and @optional methods than as categories on NSObject.

It's still a formal protocol, it just doesn't have runtime representation. I think the name is appropriate. It's an interesting point that some of the informal protocols could be formalized without penalty using this, though.

  • Given that this doesn't depend on any features in the runtime (from the runtime's perspective, the protocol doesn't exist), I don't think it makes sense to have an ObjCRuntime method to query whether this is supported by the runtime. We should enable it everywhere if it's going in anywhere.

Agreed.

  • The changes required in CGObjcCGNU.cpp are fairly small and I agree that @rjmccall's proposal for a callback-driven visitor would simplify the changes in both runtimes.
  • The semantics are slightly confusing with the deep approach though. Normally, if you iterate over the protocols that a class conforms to, you only see the ones that it directly conforms to. With this model, you'd see indirect ones. We might want to set some metadata to allow programmers to differentiate the two, or we might want to have a warning (off by default?) if an informal protocol conforms to a formal one, or simply disallow it.

I don't see why this information would be useful.

lanza added a comment.EditedSep 8 2020, 4:06 PM

A concern that has come up while rewriting this for the listed concerns is forward declared protocols that are defined as non_runtime.

@protocol NonRuntimeProto;

@interface Implementer : Root <NonRuntimeProto>
@end

@implementation Implementer
@end

This compiles just fine but with a warning in clang 12. It'll emit a reference to the NonRuntimeProto without ever having had the chance to see the objc_non_runtime_protocol defined elsewhere and will thus error out at link.

There's a few ways to move forward with this:

  1. Ignore it. If the user decided that a protocol was objc_non_runtime_protocol they probably have the insight necessary to address this link error. This is no more difficult than many C++ linker errors to diagnose.
  2. Make it an error if the protocol is not defined prior to the implementation but leave forward-decls as valid for interfaces.

I don't think it'll actually error out at link time: protocol objects get emitted eagerly on use, cross-module linking is just a code-size optimization. This actually has caused longstanding problems.

Anyway, I think it's fine to require forward declarations to have the attribute.

lanza added a comment.EditedSep 8 2020, 11:29 PM

I don't think it'll actually error out at link time: protocol objects get emitted eagerly on use, cross-module linking is just a code-size optimization. This actually has caused longstanding problems.

But if it's just a forward declaration there's nothing to emit. The above code compiles just fine as is with just a warning. Here's the result of clang protocol.m -lobjc

proto.m:10:31: warning: cannot find protocol definition for 'NonRuntimeProto'
@interface Implementer : Root<NonRuntimeProto>
                              ^
proto.m:8:11: note: protocol 'NonRuntimeProto' has no definition
@protocol NonRuntimeProto;
          ^
1 warning generated.
Undefined symbols for architecture x86_64:
  "__OBJC_PROTOCOL_$_NonRuntimeProto", referenced from:
      __OBJC_CLASS_PROTOCOLS_$_Implementer in proto-bd4a43.o
ld: symbol(s) not found for architecture x86_64
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)

The protocol definition isn't actually required to compile an implementation. And if that protocol is declared as objc_non_runtime_protocol it won't ever see one.

Simply requiring that it is annotated accordingly also isn't satisfactory for the same inheritance problem you mentioned above. Annotating a forward decl can tell clang not to emit it but won't let clang know if there's a base protocol that still needs to be emitted. e.g. if we have

// SomeHeader.h
@protocol Base
@end
__attribute__((objc_non_runtime_protocol))
@protocol NonRuntime <Base>
@end


// and in main.m
__attribute__((objc_non_runtime_protocol))
@protocol NonRuntime
@interface AClass : Root<NonRunime>
@end
@implementation AClass
@end

we'll get a compile warning but no link error. But it will be wrong as the protocol Base should still be in the protocollist for AClass.

I'm not sure how big of an issue it would be introducing a new error here for code that used to compile, but that's really the only way I see this working.

I don't think it'll actually error out at link time: protocol objects get emitted eagerly on use, cross-module linking is just a code-size optimization. This actually has caused longstanding problems.

But if it's just a forward declaration there's nothing to emit. The above code compiles just fine as is with just a warning. Here's the result of clang protocol.m -lobjc

proto.m:10:31: warning: cannot find protocol definition for 'NonRuntimeProto'
@interface Implementer : Root<NonRuntimeProto>
                              ^
proto.m:8:11: note: protocol 'NonRuntimeProto' has no definition
@protocol NonRuntimeProto;
          ^
1 warning generated.
Undefined symbols for architecture x86_64:
  "__OBJC_PROTOCOL_$_NonRuntimeProto", referenced from:
      __OBJC_CLASS_PROTOCOLS_$_Implementer in proto-bd4a43.o
ld: symbol(s) not found for architecture x86_64
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)

The protocol definition isn't actually required to compile an implementation. And if that protocol is declared as objc_non_runtime_protocol it won't ever see one.

Simply requiring that it is annotated accordingly also isn't satisfactory for the same inheritance problem you mentioned above. Annotating a forward decl can tell clang not to emit it but won't let clang know if there's a base protocol that still needs to be emitted. e.g. if we have

// SomeHeader.h
@protocol Base
@end
__attribute__((objc_non_runtime_protocol))
@protocol NonRuntime <Base>
@end


// and in main.m
__attribute__((objc_non_runtime_protocol))
@protocol NonRuntime
@interface AClass : Root<NonRunime>
@end
@implementation AClass
@end

we'll get a compile warning but no link error. But it will be wrong as the protocol Base should still be in the protocollist for AClass.

I'm not sure how big of an issue it would be introducing a new error here for code that used to compile, but that's really the only way I see this working.

Hmm, I thought we actually just generated a bogus definition for the protocol when it was forward-declared; really, this is better behavior that I expected. Regardless, I don't think it's worthwhile to diagnose this more strongly than a warning because of the history of not doing so.

Not really important for this PR anyway. Was your question answered well enough to move forward?

lanza added a comment.Sep 16 2020, 1:23 PM

Hmm, I thought we actually just generated a bogus definition for the protocol when it was forward-declared; really, this is better behavior that I expected. Regardless, I don't think it's worthwhile to diagnose this more strongly than a warning because of the history of not doing so.

That's fair!

Not really important for this PR anyway. Was your question answered well enough to move forward?

Yea sure, I'm okay with leaving it as a warning and moving forward and fixing your suggestions. I just wanted to make sure to cover any possible concerns.

@theraven @rjmccall should be ready for review whenever you guys are ready!

theraven accepted this revision.Sep 24 2020, 3:54 AM

Looks good to me, thank you!

This revision is now accepted and ready to land.Sep 24 2020, 3:54 AM

Okay, the result of internal review is that we're comfortable with this feature.

Reviewers brought up the point that it would be interesting to add some way to ensure unique emission of a protocol, so that protocols that can't be non-runtime could avoid bloating binary sizes outside the defining module. Maybe this could be deployment-gated so that builds targeting iOS 21 can take advantage of NSMoveOnlyType being exported but builds targeting iOS 20 still have to emit their own copy. But that's a separable improvement and doesn't need to block this work.

clang/include/clang/Basic/AttrDocs.td
4505

Suggestion:

The ``objc_non_runtime_protocol`` attribute can be used to mark that an
Objective-C protocol is only used during static type-checking and doesn't
need to be represented dynamically.  This avoids several small code-size
and run-time overheads associated with handling the protocol's metadata.
A non-runtime protocol cannot be used as the operand of a ``@protocol``
expression, and dynamic attempts to find it with ``objc_getProtocol`` will fail.

If a non-runtime protocol inherits from any ordinary protocols, classes and
derived protocols that declare conformance to the non-runtime protocol
will dynamically list their conformance to those base protocols.
clang/include/clang/Basic/DiagnosticSemaKinds.td
1062

This is dead now.

clang/lib/CodeGen/CGObjC.cpp
466

Should this make an effort to avoid declaring redundant bases? e.g.

@protocol Base @end
@protocol NonRuntime<Base> @end
@protocol Runtime<Base> @end
@interface MyClass <Runtime, NonRuntime> @end
@implementation MyClass @end

Ideally MyClass only declares conformance to Runtime rather than redundantly declaring conformance to Base, which I think you'd naturally get from this algorithm.

lanza updated this revision to Diff 294207.Sep 24 2020, 5:17 PM

Fix duplicate inheritance issue

lanza marked 5 inline comments as done.Sep 24 2020, 5:19 PM

Fixed and added a test under the REDUNDANCY prefix.

lanza added inline comments.Sep 24 2020, 5:22 PM
clang/lib/CodeGen/CGObjC.cpp
466

You are right. I fixed this but excluded the case where the declaration explicitly lists in it's <> declaration that it does inherit from a protocol. e.g. In the test case:

@protocol C @endp
@protocol B <C> @end
@protocol A <B> @end
__attribute__((objc_non_runtime_protocol)) @protocol Alpha<A> @end
__attribute__((objc_non_runtime_protocol)) @protocol Beta<B> @end
@interface Implementer : Root <Alpha, Beta, C> @end
@implementation Implementer @end

The non-runtime removing algorithm generates A,B,C as the list but B and C are both redundant. It makes sense to remove B for that reason but since the declaration explicitly mentions C I vote that it makes sense to leave it included as that's what one would expect from a class_copyProtocolList.

rjmccall added inline comments.Sep 24 2020, 8:53 PM
clang/lib/CodeGen/CGObjC.cpp
490

You should use llvm::DenseSet for this. But in general there are more sets here than I think you really need, and you're building them eagerly when you don't have to. I would suggest:

  1. Walk the list of declared protocols, adding the runtime protocols to the main result list and the first runtime protocols implied by the non-runtime protocols to a different list.
  1. If there are any protocols in the first-implied list (which is unlikely), build the implied-protocols set for all the protocols in the main list (inclusive of the protocols there) and the first-implied list (exclusive). It would be totally reasonable to add a method to make this easier: void ObjCProtocolDecl::getImpliedProtocols(llvm::DenseSet<const ObjCProtocolDecl*> &) const;
  1. Add any protocols that are in the first-implied list but not the implied set back to the main list.

Also, protocols can be redeclared, so you should make sure to map to the canonical decl before adding them to a set (or UniqueVector).

lanza updated this revision to Diff 295729.Oct 1 2020, 9:12 PM

Update with John's suggestions

lanza requested review of this revision.Oct 1 2020, 9:18 PM
lanza marked an inline comment as done.
lanza added inline comments.
clang/lib/CodeGen/CGObjC.cpp
490

Got ya, that should be much better for the 99.99% case where there are no non-runtime protocols. 👍

lanza marked an inline comment as not done.Oct 1 2020, 9:19 PM
lanza marked an inline comment as done.

Thanks, that's a lot better. Just some minor suggestions.

clang/lib/CodeGen/CGObjC.cpp
477

empty(), please

487

You should add something like ", including all the runtime protocols but not the non-runtime protocols".

499

Comment on this pass.

501

PD is already canonical here (and if it weren't, looking in AllImpliedProtocols wouldn't work).

lanza added inline comments.Oct 2 2020, 11:04 AM
clang/lib/CodeGen/CGObjC.cpp
487

Do you anticipate a usage of getImpliedProtocols other than this algorithm? I include the non-runtime protos in the all implied list simply because it simplifies the collection. e.g. you insert iff it's a runtime protocol and if not you have to check contains and then potentially add a non-runtime to the work queue as many times as it's seen.

Ultimately their inclusion in the all-implied list is meaningless because we never attempt to insert a non-runtime protocol into the RuntimePDs list anyways. So it's either extra elements in the AllImplied list or extra work in the getImpliedProtocols method without any behavioral differences.

lanza updated this revision to Diff 295876.Oct 2 2020, 11:05 AM

Comments

rjmccall added inline comments.Oct 2 2020, 11:35 AM
clang/lib/CodeGen/CGObjC.cpp
487

Ah, yes, that's a fair point.

lanza updated this revision to Diff 295911.Oct 2 2020, 2:35 PM

Clean clang-tidy warnings before landing

This revision was not accepted when it landed; it landed in state Needs Review.Oct 2 2020, 2:38 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
kastiglione added inline comments.
clang/include/clang/AST/DeclObjC.h
2174

This comment refers to the original spelling.

lanza added inline comments.Oct 3 2020, 10:36 AM
clang/include/clang/AST/DeclObjC.h
2174

Yes indeed, I'll push a comment fix commit.