Page MenuHomePhabricator

Add Windows support for the GNUstep Objective-C ABI V2.
ClosedPublic

Authored by theraven on Aug 1 2018, 9:03 AM.

Details

Summary

Introduces funclet-based unwinding for Objective-C and fixes an issue
where global blocks can't have their isa pointers initialised on
Windows.

After discussion with Dustin, this changes the name mangling of
Objective-C types to prevent a C++ catch statement of type struct X*
from catching an Objective-C object of type X*.

Diff Detail

Event Timeline

theraven created this revision.Aug 1 2018, 9:03 AM
DHowett-MSFT accepted this revision.Aug 1 2018, 2:43 PM

This looks good to me, but I don't have a strong understanding of "outlining."

lib/CodeGen/CGBlocks.cpp
1262

Is there value in emitting a list of blocks that need to be initialized, then initializing them in one go in a COMDAT-foldable function?

lib/CodeGen/CGObjCGNU.cpp
1534

nit: fore

lib/CodeGen/CGObjCRuntime.cpp
236

this region may need reformatting

277

There's a scoped save/restore helper for this that'll offer better support for nested funclets.

SaveAndRestore<llvm::Instruction *> RestoreCurrentFuncletPad(CGF.CurrentFuncletPad);
This revision is now accepted and ready to land.Aug 1 2018, 2:43 PM
DHowett-MSFT added inline comments.
include/clang/Driver/Options.td
1491

Is this so it's exposed to clang-cl?

lib/AST/MicrosoftMangle.cpp
1887

I'm interested in @smeenai's take on this, as well.

theraven updated this revision to Diff 158997.Aug 3 2018, 6:38 AM
  • Address Dustin's review comments.
theraven added inline comments.Aug 6 2018, 5:18 AM
include/clang/Driver/Options.td
1491

Yup. Unfortunately, the normal work around of passing the argument with -Xclang (so -Xclang -fobjc-runtime=gnustep-2.0) doesn't work because the driver makes various other decisions about what flags to pass to the front end based on the Objective-C runtime and makes all of those assuming -fobjc-runtime=gcc. It is not possible to compile ARC code without this.

lib/CodeGen/CGBlocks.cpp
1262

I think that the best solution is to move this into the back end, so that this code goes away in the front end, but anything that's referring to a dllimport global in a global initialiser is transformed in the back end to a list of initialisations and a comdat function that walks the list and sets them up. That said, this seems sufficiently generally useful that it would be nice for the function to be in the CRT bits.

I should be in Redmond some time in October, so maybe we can discuss it with some of the VS team then?

theraven updated this revision to Diff 159293.Aug 6 2018, 6:37 AM
  • Fix an issue in protocol generation.
theraven marked 3 inline comments as done.Aug 6 2018, 7:24 AM
theraven updated this revision to Diff 159312.Aug 6 2018, 8:41 AM
  • Fix failing test.

I'd like to commit this, unless @rjmccall has any objections.

rjmccall added inline comments.Aug 7 2018, 10:31 PM
lib/AST/MicrosoftMangle.cpp
448

I don't think we want ObjCInterfaceDecls to enter this function normally; I'd prefer to leave that assertion intact.

2576

You're not really reusing any interesting logic from mangle here; please just stream the literal directly into Out.

Also, this will add the actual identifier to the substitution set, whereas your implementation below in the case for ObjCObjectType adds the prefixed identifier to the substitution set.

lib/CodeGen/CGBlocks.cpp
1216

Should this be a PE/COFF-specific restriction? Or otherwise somehow restricted to the "native" environment? I don't know enough about all the UNIX-on-Windows approaches to know whether the same restriction would apply to all of them. I did think they generally used the Itanium C++ ABI, and the Itanium ABI relies on linking v-tables from the C++ standard library. Maybe those environments just all use static linking to get around that.

1262

Can the blocks part of this patch be split out? It's basically a totally different bugfix.

1276

Is the priority system not good enough?

lib/CodeGen/CGObjCGNU.cpp
915

Can this section-names cleanup also just be in a separate patch?

2262

I think this is the third different way that you test for Windows in this patch. :) Is there a reason why this is just testing for MSVC and the others are testing for PE/COFF or for Windows generally?

3821

You're sure here that the static information aligns with the dynamic?

lib/CodeGen/CGObjCRuntime.cpp
125

Please add some sort of comment about the meaning and source of these flags.

lib/CodeGen/EHScopeStack.h
424 ↗(On Diff #159312)

Please add a function on SGF to push this cleanup or something instead of globally declaring it.

DHowett-MSFT added inline comments.Aug 7 2018, 11:29 PM
lib/CodeGen/CGBlocks.cpp
1276

My reading of the LLVM language reference leads me to believe it’s only ordered per-module. If that’s the case, the benefit of emitting into XC* is that it provides guaranteed order over all linker input.

llvm.global_ctors excerpt:

The functions referenced by this array will be called in ascending order of priority (i.e. lowest first) when the module is loaded.

Now if the priority system _is_ guaranteed over all linker input, will that guarantee hold for mixed Clang and CL objects?

rjmccall added inline comments.Aug 8 2018, 12:33 AM
lib/CodeGen/CGBlocks.cpp
1276

Init priorities on ELF are preserved in the section name like .init_array.200, and the sections get sorted by that in the image — it's really a very similar trick to how it works with .CRT$XC. Of the major formats, I think it's just Mach-O that doesn't have any built-in prioritization mechanism across object files. But I don't know if LLVM actually tries to translate init priorities over into .CRT$XC suffices when targeting PE/COFF, and arguably that's good: init priorities as presented in LLVM right now are pretty specific to the ELF mechanism. Long-term, maybe llvm.global_ctors should be generalized so that on ELF targets it takes an integer priority, on PE/COFF targets it takes a string, and on Mach-O it doesn't take anything at all; but I won't hold up this patch for that.

On the other hand, I tend to agree that maybe the best solution is for the backend to just take care of this and automatically create a global initializer to install non-function (or maybe non-function & unnamed_addr) dllimported symbols in global data.

mstorsjo added inline comments.
lib/CodeGen/CGBlocks.cpp
1276

But I don't know if LLVM actually tries to translate init priorities over into .CRT$XC suffices when targeting PE/COFF, and arguably that's good: init priorities as presented in LLVM right now are pretty specific to the ELF mechanism. Long-term, maybe llvm.global_ctors should be generalized so that on ELF targets it takes an integer priority, on PE/COFF targets it takes a string, and on Mach-O it doesn't take anything at all; but I won't hold up this patch for that.

FWIW, for the MinGW targets, the same ELF-like priority system is used (or I think it's actually an older mechanism previously used on ELF); constructors are emitted in .ctors, and those with a priority go into sections named .ctors.01234 where the number is 65535 minus the init priority specified. By making these zero padded, they get ordered right by the normal alphabetical sorting. MinGW CRT startup routines then have extra code to run constructors in the right order from the .ctors section, in addition to the normal .CRT$XC ones.

theraven updated this revision to Diff 159651.Aug 8 2018, 1:42 AM
theraven marked 2 inline comments as done.
  • Address Dustin's review comments.
  • Fix an issue in protocol generation.
  • Fix failing test.
  • [gnu-objc] Make selector order deterministic.
  • Address rjmcall's review comments.
theraven updated this revision to Diff 159653.Aug 8 2018, 1:54 AM
  • Revert blocks part of the patch to put in a separate review.
theraven added inline comments.Aug 8 2018, 4:20 AM
lib/CodeGen/CGBlocks.cpp
1262

Split into D50436.

lib/CodeGen/CGObjCGNU.cpp
915

This is difficult to extract, because it's mostly needed for the COFF part where we need to modify the section names. For ELF, it was fine to keep them as separate const char*s

2262

For the EH logic, we are using DWARF exceptions if we are targeting a Windows environment that uses DWARF EH and SEH-based exceptions if we are targeting a Windows environment that uses MSVC-compatible SEH.

The section code is specific to PE/COFF, where we don't get to use some of the ELF tricks.

The blocks code is specific to Windows, because it relates to Windows run-time linking. Hypothetically, a PE/COFF platform could provide dynamic relocations that would eliminate the need for that check.

It's possible that some of the PE/COFF vs Windows distinctions are wrong. For example, UEFI images are PE/COFF binaries and if anyone wanted to use blocks in a UEFI firmware image then they may find that they need the Windows code path (but I do not have a good way of testing this, so restricted it to Windows initially). Similarly, some of the section initialisation code in CGObjCGNU may actually be Windows-only, but it's probably a good starting point for anyone who actually wants to use Objective-C in UEFI firmware (though the final destination for such people is likely to involve padded cells).

3821

I'm not sure I understand this question.

lib/CodeGen/CGObjCRuntime.cpp
125

These are not well documented anywhere, including in the Clang Microsoft C++ ABI code that I read to see why exceptions weren't working, but I've added a small comment that explains why they exist. I have no idea where the values come from though.

mgrang added inline comments.Aug 8 2018, 10:22 AM
lib/CodeGen/CGObjCGNU.cpp
3542
rjmccall added inline comments.Aug 8 2018, 12:51 PM
lib/CodeGen/CGObjCGNU.cpp
915

I don't mind if the extracted patch seems unmotivated on its own, but I do think it should be a separate patch.

2262

Okay, the exception logic makes sense. Please make this a common predicate instead of repeating it in different places.

My understanding is that the restriction on the use of dllimport-ed symbols is a general PE restriction: the format itself only supports binds in the import address table, and everything else has just to be a rebase. I mean, of course you can imagine a loader that extends PE to support arbitrary binds, but you can also imagine a loader that extends PE to support just embedding an ELF image. Firmware probably never uses imported symbols.

3542

Also, why is the sort necessary? Should this change be separately committed and tested?

3821

Your new code path no longer uses ExceptionAsObject, which is our static notion of the current exception value. Instead, you call a runtime function which presumably relies on a dynamically-stored current exception value. I'm just asking if, in this lexical position, you're definitely rethrowing the right exception.

lib/CodeGen/CGObjCRuntime.cpp
125

Just mentioning that they're the same catch flags used in the Microsoft C++ ABI is all I'm asking for. Although maybe there could be an abstraction for that instead of passing them around separately?

theraven updated this revision to Diff 159868.Aug 9 2018, 1:06 AM
  • Address rjmcall's review comments.
  • Revert blocks part of the patch to put in a separate review.
theraven marked an inline comment as done.Aug 9 2018, 3:42 AM

I believe that this is now ready to land.

lib/CodeGen/CGObjCGNU.cpp
915

None of the changes to do this are in a separate commit, and they touch enough of the code that it's very difficult to separate them without ending up with conflicts in this branch (which touches the code surrounding the changes in multiple places). I cannot extract the code in such a way that this patch would cleanly rebase on top, because various things (e.g. the null section code) needed restructuring to work with Windows and touch this logic.

3542

Ooops, this was a fix for PR35277, which was meant to be a separate review. I forgot to remove it from this branch after cherry-picking it to another one. Removed.

3821

Ah, I see. This code path is hit only as a @throw;, not a @throw obj (in the latter case, there is a non-null return from S.getThrowExpr()). In this case, the value of ExceptionAsObject may not be useful. In the case of a catchall, this is a load from a stack location with no corresponding store. The Windows unwind logic keeps the object on the stack during funclet execution and then continues the unwind at the end, without ever providing this frame's funclets with a pointer to it. That's probably not very obvious, so I've added an explanatory comment.

theraven updated this revision to Diff 159887.Aug 9 2018, 3:42 AM
  • Address Dustin's review comments.
  • Fix an issue in protocol generation.
  • Fix failing test.
  • Address rjmcall's review comments.
  • Add some missing comments and factor out SEH check.
rjmccall accepted this revision.Aug 9 2018, 10:33 PM

LGTM.

lib/CodeGen/CGObjCGNU.cpp
915

Alright, I won't insist.

3821

Thanks.

theraven updated this revision to Diff 160091.Aug 10 2018, 5:12 AM

Squashed into a single commit.

This revision was automatically updated to reflect the committed changes.
compnerd added inline comments.
lib/AST/MicrosoftMangle.cpp
1887

Hmm, doesn't this break ObjC/ObjC++ interoperability? I don't think that we should be changing the decoration here for the MS ABI case.

theraven added inline comments.Aug 15 2018, 1:46 AM
lib/AST/MicrosoftMangle.cpp
1887

No, this fixes ObjC++ interop. Throwing an exception from a @throw and catching it with catch now works and we avoid the problem that a C++ compilation unit declares a struct that happens to have the same name as an Objective-C class (which is very common for wrapper code) may catch things of the wrong type.

smeenai added inline comments.Aug 15 2018, 12:00 PM
lib/AST/MicrosoftMangle.cpp
1887

I've seen a lot of code which does something like this in a header

#ifdef __OBJC__
@class I;
#else
struct I;
#endif

void f(I *);

You want f to be mangled consistently across C++ and Objective-C++, otherwise you'll get link errors if f is e.g. defined in a C++ TU and referenced in an Objective-C++ TU. This was the case before your change and isn't the case after your change, which is what @compnerd was pointing out. They are mangled consistently in the Itanium ABI, and we want them to be mangled consistently in the MS ABI as well.

Not wanting the catch mix-up is completely reasonable, of course, but it can be done in a more targeted way. I'll put up a patch that undoes this part of the change while still preventing incorrect catching.

theraven added inline comments.Aug 16 2018, 1:26 AM
lib/AST/MicrosoftMangle.cpp
1887

With a non-fragile ABI, there is not guarantee that struct I and @class I have the same layout. This is why @defs has gone away. Any code that does this and treats struct I as anything other than an opaque type is broken. No other platform supports throwing an Objective-C object and catching it as a struct, and this is an intentional design decision. I would be strongly opposed to a change that intentionally supported this, because it is breaking correct code in order to make incorrect code do something that may or may not work.

smeenai added a subscriber: rnk.Aug 22 2018, 12:15 PM

Sorry for the belated response; I was on vacation.

I'm confused by the funclet-based unwinding support. Do you intend GNUStep to be used with windows-msvc triples? If so, how is the runtime throwing exceptions? We took the approach of having the Obj-C runtime wrap Obj-C exceptions in C++ exceptions and then have vcruntime handle the rest (see https://reviews.llvm.org/D47233 and https://reviews.llvm.org/D47988), but I don't see any of the associated RTTI emission changes in this patch.

I also don't see any tests added for the codegen changes around @finally. I'm planning on changing that approach anyway (by pushing https://reviews.llvm.org/D47988) through, but it still seems like something that was missing from this patch.

lib/AST/MicrosoftMangle.cpp
1887

To be clear, I'm aware of the layout differences. My plan was to revert the mangling here to what it was before (which would also be consistent with Itanium), and then adjust the RTTI emission for Obj-C exception types (which I'm handling in https://reviews.llvm.org/D47233) to differentiate between C++ and Obj-C exceptions, which would prevent unintended catches.

lib/CodeGen/CGObjCRuntime.cpp
203

This is *very* limiting. This outlining doesn't handle capturing self, capturing block variables, and lots of other cases that come up in practice. I attempted to implement some of the missing cases, before meeting with @compnerd and @rnk and deciding that it was better to use CapturedStmt instead. See https://reviews.llvm.org/D47564 for more discussion, and https://reviews.llvm.org/D47988 implements the associated codegen (unfortunately I wasn't able to get that through before going on vacation).

rnk added inline comments.Aug 22 2018, 5:01 PM
lib/CodeGen/CGBlocks.cpp
1276

I filed a bug to turn the integers into strings on COFF: https://bugs.llvm.org/show_bug.cgi?id=38552

I'm confused by the funclet-based unwinding support. Do you intend GNUStep to be used with windows-msvc triples? If so, how is the runtime throwing exceptions? We took the approach of having the Obj-C runtime wrap Obj-C exceptions in C++ exceptions and then have vcruntime handle the rest (see https://reviews.llvm.org/D47233 and https://reviews.llvm.org/D47988), but I don't see any of the associated RTTI emission changes in this patch.

I'm assuming 'we' here is Apple? Microsoft has a friendly fork of the GNUstep runtime in the WinObjC project and I am slowly working on pulling in the WinObjC downstream changes, incorporating them with the v2 ABI, and making it easy for WinObjC to use an unmodified version of the runtime. The v2 ABI still has a few things that don't work with the Windows linkage model, which I'm working out how best to address. The code for throwing an exception is here:

https://github.com/gnustep/libobjc2/blob/master/eh_win32_msvc.cc

The runtime constructs an SEH exception on the stack. The low-level NT exception type knows all of the possible types that might be allowed in a catch, so the runtime dynamically constructs a list of all of these, one per superclass in the hierarchy, on the stack. The funclet-based unwinding mechanism makes it safe to do this, because the throwing stack frame remains valid until after the exception is caught. The Windows C++ personality function then checks the catches the exceptions. Clang/CX (Clang with the Visual Studio compiler back end) has supported this for a while, but it never made it upstream to Clang/LLVM and I'm trying to rectify this.

On 64-bit Windows, we must provide 32-bit offsets to some arbitrary base. For C++, this is the base load address of the DLL. That's fine for C++, where the throw types are static (and the entire class hierarchy is available at compile time), but our stack may be more than 4GB away from any library. Instead, we use an offset from some arbitrary on-stack location (and if we end up being more than 4GB away from the base of our stack frame, then something's gone very badly wrong!).

lib/AST/MicrosoftMangle.cpp
1887

Okay, that sounds fine. The runtime needs to know about the mangling to generate the types, but we haven't yet done the 2.0 release of the runtime and Windows support is incomplete with the new ABI in the clang 7 release branch, so will be marked as an experimental feature. We've merged something based on the WinObjC changes, though with some clean-ups so that it works correctly on Win64, but can change the mangling up until we've made the 2.0 release.

lib/CodeGen/CGObjCRuntime.cpp
203

This worked for the cases that I tested, but that was far from exhaustive. I'm very happy to switch to something better, and D47988 does, indeed, look like something better.