This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Emit MSVC RTTI for Obj-C EH types
AcceptedPublic

Authored by smeenai on May 22 2018, 3:44 PM.

Details

Summary

We're implementing funclet-compatible code generation for Obj-C
exceptions when using the MSVC ABI. The idea is that the Obj-C runtime
will wrap Obj-C exceptions inside C++ exceptions, which allows for
interoperability with C++ exceptions (for Obj-C++) and zero-cost
exceptions. This is the approach taken by e.g. WinObjC, and I believe it
to be the best approach for Obj-C exceptions in the MSVC ABI.

The first step is emitting proper RTTI for Obj-C exception types. Since
we're wrapping Obj-C exceptions in C++ exceptions, the RTTI should be
identical, barring the name of the RTTI variable (OBJC_EHTYPE_$_*).
Since the MSVC ABI does not easily allow for cross-DLL data references
from within other data, we instead emit the RTTI locally wherever
needed, which is also how C++ RTTI works on that ABI.

Follow-up diffs will add code generation support for @try itself, but
I'm splitting it up to get early feedback and make review more
manageable.

Worked on with Saleem Abdulrasool <compnerd@compnerd.org>.

Diff Detail

Event Timeline

smeenai created this revision.May 22 2018, 3:44 PM
smeenai updated this revision to Diff 148133.May 22 2018, 4:59 PM

Colocate CHECK lines with declarations

rnk added a comment.May 25 2018, 10:42 AM

I think the approach makes sense.

lib/CodeGen/CGObjCMac.cpp
7457–7464

@rjmccall how should this be organized in the long run? At this point, the naming seems totally wrong. Is the non-fragile ABI sort of the canonical way forward for Obj-C, i.e. it's what a new platform would want to use to best stay in sync with the future of obj-c?

DHowett-MSFT accepted this revision.May 29 2018, 6:16 PM

This largely matches what we've done in the WinObjC clang patchset here, with the critical exception that we've chosen to mangle the EH names as though they were for structs named after their classes.

Is there some safe generalization that will apply well to all Objective-C runtimes, and that we could put in a common place?

Overall, the change seems reasonable to me, but my signoff cannot constitute strong-enough acceptance.

This revision is now accepted and ready to land.May 29 2018, 6:16 PM

This largely matches what we've done in the WinObjC clang patchset here, with the critical exception that we've chosen to mangle the EH names as though they were for structs named after their classes.

Thanks for the pointer!

To clarify, when you're talking about mangling the EH names, do you mean the names of the typeinfo structures themselves (OBJC_EHTYPE_* in my implementation), or the typeinfo name strings inside those structures? The latter should be equivalent to structs for us too, i.e. @catch (I *) and catch (struct I *) would produce the same name in the generated type info.

smeenai updated this revision to Diff 150399.Jun 7 2018, 1:27 PM

Address objc_exception attribute case

The non-fragile Objective-C path — i.e. interoperation with C++ exceptions instead of using setjmp/longjmp in an utterly-incompatible style — is absolutely the right direction going forward.

How does "wrapping an Objective-C exception inside a C++ exception" work? Do you mean that you'll throw and catch a well-known C++ exception type and then separately test for subclassing in the catch clause? How do you intend to handle successive catch clauses in that case? Note that you cannot just wrap the Objective-C exception inside a separate C++ exception corresponding to the static type of the exception: unlike C++ pointers (even polymorphic ones), exception matching for ObjC pointers is based on the dynamic type of the exception.

Wrapping an Objective-C exception inside a C++ exception means dynamically constructing a C++ exception object and traversing the class hierarchy of the thrown Obj-C object to populate the catchable types array of the C++ exception. Microsoft's C++ runtime will perform the appropriate catch type comparisons, and this patch makes the compiler emit the right typeinfos into the exception handling tables for all of that to work. https://github.com/Microsoft/libobjc2/blob/f2e4c5ac4b3ac17f413a38bbc7ee1242f9efd0f7/msvc/eh_winobjc.cc#L116 is how WinObjC does this wrapping, for example.

Essentially, with this patch, @catch (I *) in Obj-C ends up generating the same exception handling table as catch (struct I *) in C++ would. If you're throwing an I * (or any derived class), the runtime will generate an exception object which is catchable by this handler (the catchable types array for that exception object will contain the appropriate typeinfo). Successive catch clauses don't need any special handling; they work the same way as they would in C++. The runtime is generating that exception object based on a dynamic traversal of the class hierarchy, so I think Obj-C's dynamic semantics should be respected.

Wrapping an Objective-C exception inside a C++ exception means dynamically constructing a C++ exception object and traversing the class hierarchy of the thrown Obj-C object to populate the catchable types array of the C++ exception. Microsoft's C++ runtime will perform the appropriate catch type comparisons, and this patch makes the compiler emit the right typeinfos into the exception handling tables for all of that to work. https://github.com/Microsoft/libobjc2/blob/f2e4c5ac4b3ac17f413a38bbc7ee1242f9efd0f7/msvc/eh_winobjc.cc#L116 is how WinObjC does this wrapping, for example.

I see. The idea of creating the type descriptors and mangled names ad-hoc for the catchable-types array is clever, and it's nice that the ABI is defined in a way that makes that work. (Expensive, but hey, it's the exceptions path.)

Alright, now that I understand why this only matters for the catch side, I'll take a look at the patch.

WinObjC does this wrapping, for example.

I see. The idea of creating the type descriptors and mangled names ad-hoc for the catchable-types array is clever, and it's nice that the ABI is defined in a way that makes that work. (Expensive, but hey, it's the exceptions path.)

We ran into some critical issues with this approach on x64. The pointers in the exception record are supposed to be image-relative instead of absolute, so I tried to make them absolute to libobjc2's load address, but I never quite solved it.

A slightly better-documented and cleaner version of the code you linked is here.

WinObjC does this wrapping, for example.

I see. The idea of creating the type descriptors and mangled names ad-hoc for the catchable-types array is clever, and it's nice that the ABI is defined in a way that makes that work. (Expensive, but hey, it's the exceptions path.)

We ran into some critical issues with this approach on x64. The pointers in the exception record are supposed to be image-relative instead of absolute, so I tried to make them absolute to libobjc2's load address, but I never quite solved it.

A slightly better-documented and cleaner version of the code you linked is here.

We solved the x64 issue by just calling RaiseException directly and supplying a fake ImageBase. It's a bit kludgy, but it works well. (_CxxThrowException's source is included with MSVC, so we just looked at how that was calling RaiseException and altered it accordingly.)

rjmccall added inline comments.Jun 18 2018, 11:20 AM
lib/CodeGen/CGObjCMac.cpp
7457–7464

For Darwin, yes, absolutely.

I think this method should probably just completely delegate to the CGCXXABI using a new getAddrOfObjCRTTIDescriptor method.

smeenai added inline comments.Jun 22 2018, 2:38 PM
lib/CodeGen/CGObjCMac.cpp
7457–7464

To be clear, you'd want the entirety of the EHType emission logic to be shifted to CGCXXABI? I think that would make sense, and I can look into it.

rjmccall added inline comments.Jun 22 2018, 3:29 PM
lib/CodeGen/CGObjCMac.cpp
7457–7464

Right.

smeenai added inline comments.Jul 10 2018, 10:34 PM
lib/CodeGen/CGObjCMac.cpp
7457–7464

Sorry, getting back to this now.

What did you have in mind for handling the different Obj-C runtimes? Were you envisioning the new getAddrOfObjCRTTIDescriptor method supporting just the non-fragile Mac ABI or all of them?

smeenai added inline comments.Jul 16 2018, 4:46 PM
lib/CodeGen/CGObjCMac.cpp
7457–7464

Looked into this more. ItaniumRTTIBuilder already has a BuildObjCObjectTypeInfo, which confused me for a bit until I realized it's only ever used for the fragile ABI, and even then only when using C++ try/catch (instead of Obj-C style @try/@catch), which is a bit strange. Everything else does its own thing.

From what I've been able to make out, these are the current possibilities for generating Obj-C RTTI for the Itanium ABI:

  • Fragile macOS runtime using Obj-C @try/@catch: doesn't actually seem to generate any RTTI as far as I can tell, and uses objc_exception_match instead.
  • Fragile macOS runtime using C++ try/catch: goes through ItaniumRTTIBuilder::BuildObjCObjectTypeInfo, which generates C++ compatible RTTI referencing a C++ class type info.
  • Non-fragile macOS runtime: generates its own C++ compatible RTTI referencing objc_ehtype_vtable.
  • GNUStep for Objective-C++: generates its own C++ compatible RTTI referencing _ZTVN7gnustep7libobjc22__objc_class_type_infoE.
  • All other GNU runtimes (including GNUStep for Objective-C): just uses the identifier name string of the interface as its "RTTI".

I think it makes sense to only have the new getAddrOfObjCRTTIDescriptor method handle the non-fragile macOS runtime for now, and perhaps ItaniumRTTIBuilder::BuildObjCObjectTypeInfo should be renamed (or at least have a comment added) to indicate that it's only used for the fragile macOS runtime when catching an Obj-C type with C++ catch.

rjmccall added inline comments.Jul 17 2018, 3:43 PM
lib/CodeGen/CGObjCMac.cpp
7457–7464

Yeah, that makes sense to me for now.

We ran into some critical issues with this approach on x64. The pointers in the exception record are supposed to be image-relative instead of absolute, so I tried to make them absolute to libobjc2's load address, but I never quite solved it.

A slightly better-documented and cleaner version of the code you linked is here.

(For the reference of other people, since Dustin and I discussed this offline a while back)

This is fixed upstream. We use a stack address as our base address and construct stack-variable-relative offsets. This is safe to do because the only requirement for Win64 EH is that the pointers be relative to some arbitrary base and in the SEH model the throwing stack frame remains live until the last catch.

smeenai added inline comments.Sep 27 2018, 12:31 PM
lib/CodeGen/CGObjCMac.cpp
7457–7464

@rjmccall Sorry, I've been severely distracted, but I'm trying to come back and finish this up now. I'm fully aware you've probably lost all context on this by now as well, and I do apologize for the delay.

I did try implementing this approach, where I had getAddrOfObjCRTTIDescriptor as part of CGCXXABI. I ran into issues with trying to take the existing Itanium implementation of CGObjCNonFragileABIMac::GetInterfaceEHType and porting it over to to CGCXXABI, however. Specifically, it calls GetClassName and GetClassGlobal, both of which are non-trivial internal functions, and which I would then have to make accessible to CGCXXABI to end up with equivalent RTT generation.

I can put up a diff demonstrating what it would end up looking like if you want to see it, but I'm less convinced now that moving the entire RTTI emission logic out to CGCXXABI will end up being cleaner.