Page MenuHomePhabricator

[WIP] [ItaniunCXXABI] Add an option for loading the type info vtable with dllimport
AbandonedPublic

Authored by mstorsjo on Feb 12 2018, 5:59 AM.

Details

Summary

The first member of the type info structs/objects is a pointer to the vtable of the type info class. If the standard C++ library that provides this vtable is linked as a DLL, this field of the struct needs to be initialized differently.

If statically initializing a variable with a pointer to a dllimported variable, that initalization can't be done as normal static initialization, since the address of the variable only will be available at runtime via the IAT.

For a struct/class with dllimported members, clang skips the normal static initalization and instead produces a constructor that will do the equivalent initialization at runtime.

For type info objects that are instantiated in ItaniumCXXABI, it's not enough to just set the dllimport attribute on the vtable pointer to invoke the existing generation of a constructor in CodeGenModule::EmitCXXGlobalVarDeclInitFunc in CGDeclCXX.cpp. Instead ItaniumCXXABI needs to manually produce the equivalent code for the runtime initialization as well, without a VarDecl for this struct.

To enable this behaviour, a new compiler flag, -fcxx-dll, is added, that can be set when building code that expects to be linking to the standard C++ library as a DLL.

This hasn't been an issue before, if linking with GNU ld, since GNU ld automatically can handle references to variables that weren't marked as dllimport during compilation, if the undefined references are found in a DLL import library. Since lld doesn't support this behaviour, we need to properly use dllimport mechanisms even for this field.

The actual implementation isn't very elegant yet (it's only a proof of concept so far) - directions on how to do it better are welcome.

Diff Detail

Event Timeline

mstorsjo created this revision.Feb 12 2018, 5:59 AM

Do I understand correctly that this workarounds a feature missing in lld? Does MinGW emit the same sorts of object files as clang in these scenarios?

rnk added a comment.Feb 12 2018, 9:35 AM

Do I understand correctly that this workarounds a feature missing in lld? Does MinGW emit the same sorts of object files as clang in these scenarios?

You can see it that way, but having the linker synthesize a dynamic initializer to initialize dllimport data seems a bit heroic to me. MSVC link doesn't do it. It requires making assumptions about how your C runtime runs dynamic initializers. Your DLL might not even link msvcrt, so how does the linker know where to put the initializer?

On the other hand, you can be assured that users will ask us for this bfd linker feature indefinitely if we don't implement it. It papers over differences between the COFF and ELF object models, and mingw usually papers things over rather than pushing the cost onto ELF-oriented codebases.

Clang already creates dynamic initializers for stuff like:

__declspec(dllimport) int imported;
int *const pi = &imported;
int foo() { return *pi; }

So, it doesn't seem like a bridge too far to make dynamic initializers for globals with vtables. It's dicey, though. It means there's a point in your program when your vptr is null. =/

Do you think we should do something like local vftables for itanium instead? Can we assume all methods in the vtable will be exported by the DLL exporting the class?

FYI, binutils auto-import actually creates a fake IAT entry rather than using a dynamic initializer. I think it's actually a pretty cute trick. http://blog.omega-prime.co.uk/2011/07/04/everything-you-never-wanted-to-know-about-dlls/#how-auto-import-works has details. That only works when you're referencing an external imported symbol directly though, and breaks when you need an offset from the imported symbol.

rnk added a comment.Feb 12 2018, 9:54 AM

FYI, binutils auto-import actually creates a fake IAT entry rather than using a dynamic initializer. I think it's actually a pretty cute trick. http://blog.omega-prime.co.uk/2011/07/04/everything-you-never-wanted-to-know-about-dlls/#how-auto-import-works has details. That only works when you're referencing an external imported symbol directly though, and breaks when you need an offset from the imported symbol.

Hm, unfortunately that symbol offset case is exactly what comes up in Itanium if you add in a little multiple inheritance.

FYI, binutils auto-import actually creates a fake IAT entry rather than using a dynamic initializer. I think it's actually a pretty cute trick. http://blog.omega-prime.co.uk/2011/07/04/everything-you-never-wanted-to-know-about-dlls/#how-auto-import-works has details. That only works when you're referencing an external imported symbol directly though, and breaks when you need an offset from the imported symbol.

Yup - except that it also can emit "pseudo-relocs" that require making the code segment writable, and these relocations are fixed by the mingw runtime on load. These are exceptional/ugly enough that they're not enabled by default but require a linker flag.

In D43184#1005258, @rnk wrote:

You can see it that way, but having the linker synthesize a dynamic initializer to initialize dllimport data seems a bit heroic to me. MSVC link doesn't do it. It requires making assumptions about how your C runtime runs dynamic initializers. Your DLL might not even link msvcrt, so how does the linker know where to put the initializer?

As @smeenai said, it's done by IAT tricks, not actually creating runtime initializers.

On the other hand, you can be assured that users will ask us for this bfd linker feature indefinitely if we don't implement it. It papers over differences between the COFF and ELF object models, and mingw usually papers things over rather than pushing the cost onto ELF-oriented codebases.

Clang already creates dynamic initializers for stuff like:

__declspec(dllimport) int imported;
int *const pi = &imported;
int foo() { return *pi; }

A related observation on the topic of this: There's a subtle difference between what both GCC and MSVC does here, compared to clang. The case with a single variable works the same in all of them, but if you have a struct with many initialized elements, GCC and MSVC will initialize as many as possible of them statically, and only fill in the dllimport ones via a dynamic initializer. clang, on the other hand, will not initialize anything statically at all if it emits a dynamic initializer.

So, it doesn't seem like a bridge too far to make dynamic initializers for globals with vtables. It's dicey, though. It means there's a point in your program when your vptr is null. =/

Yes, and that case is also already present with the normal struct members with dllimport. That case actually turned into a real bug in trying to run Qt. Qt has got constructors that will touch a struct that contains a dllimported field. The constructor doesn't touch or use the dllimported field, only the others. This means that as long as it's built with GCC and MSVC, there's no race condition/static initialization order fiasco between the struct-with-dllimport and the constructor, since GCC and MSVC fill in all the other members statically. When built with clang though, if you're unlucky, the Qt defined constructor may run before the clang generated initializer fills in all of the struct.

I managed to work around this issue by adding a constructor priority to these structs, making sure that all the clang generated initializers run before the normal constructors: https://codereview.qt-project.org/215792

In this PoC, I also emit the initializers with a high priority (actually higher priority than can be set from normal user code) - so I think the fact that these pointers are null originally shouldn't be observable, unless using special runtime internals to hook up code to run before normal constructors.

Do you think we should do something like local vftables for itanium instead? Can we assume all methods in the vtable will be exported by the DLL exporting the class?

This I don't know yet (I only know the details in cases that I've had to study when debugging some issue), but if you think it'd (and can hint about what to change in order to use that), I can try it and see if it works for my testcase.

In D43184#1005258, @rnk wrote:

Do you think we should do something like local vftables for itanium instead? Can we assume all methods in the vtable will be exported by the DLL exporting the class?

Will this actually ever be needed for other vtables than cxxabi::class_type_info and the others from ItaniumRTTIBuilder::BuildVTablePointer? In what other cases are the vtables referred to from a statically initialized global? And for these vtables - there's no declaration of them at all within most translation units, so that'd require hardcoding all the content of these vtables in ItaniumRTTIBuilder.

rnk added a comment.Feb 15 2018, 2:21 PM

A related observation on the topic of this: There's a subtle difference between what both GCC and MSVC does here, compared to clang. The case with a single variable works the same in all of them, but if you have a struct with many initialized elements, GCC and MSVC will initialize as many as possible of them statically, and only fill in the dllimport ones via a dynamic initializer. clang, on the other hand, will not initialize anything statically at all if it emits a dynamic initializer.

The Qt bug is interesting, but I think we're doing the right thing here. C++ doesn't define any semantics for partially initialized objects. It's unclear if clang or GCC's way of doing things is better or not. With clang, you get to put the global in .bss, so it's not clear what makes for bigger code size.

In D43184#1005258, @rnk wrote:

Do you think we should do something like local vftables for itanium instead? Can we assume all methods in the vtable will be exported by the DLL exporting the class?

Will this actually ever be needed for other vtables than cxxabi::class_type_info and the others from ItaniumRTTIBuilder::BuildVTablePointer? In what other cases are the vtables referred to from a statically initialized global? And for these vtables - there's no declaration of them at all within most translation units, so that'd require hardcoding all the content of these vtables in ItaniumRTTIBuilder.

Here's a case where a local vtable would help:

struct __declspec(dllimport) A { virtual void a(); };
struct __declspec(dllimport) B { virtual void b(); };
struct __declspec(dllimport) C : A, B {
  void a() override;
  void b() override;
};
constexpr C c;

I'm suggesting we can make a DSO local copy of _ZTV1C, so that we can refer to it from c. It's OK if the vtable refers to the imported thunks, since the address of virtual methods isn't user visible.

Either that, or we need to make something like this change here, right?

In D43184#1009355, @rnk wrote:

The Qt bug is interesting, but I think we're doing the right thing here. C++ doesn't define any semantics for partially initialized objects. It's unclear if clang or GCC's way of doing things is better or not. With clang, you get to put the global in .bss, so it's not clear what makes for bigger code size.

Fair enough. And I managed to work around it for now, and for a future release, they're planning on redefining this struct for windows, to avoid this issue altogether. (On the other hand, I guess the C++ standard doesn't concern itself about dllimport at all, so on the standard level, there's nothing that would say that this even is a partially initialized object. Not sure in which way it would affect an argument anyway.)

In D43184#1005258, @rnk wrote:

Do you think we should do something like local vftables for itanium instead? Can we assume all methods in the vtable will be exported by the DLL exporting the class?

Will this actually ever be needed for other vtables than cxxabi::class_type_info and the others from ItaniumRTTIBuilder::BuildVTablePointer? In what other cases are the vtables referred to from a statically initialized global? And for these vtables - there's no declaration of them at all within most translation units, so that'd require hardcoding all the content of these vtables in ItaniumRTTIBuilder.

Here's a case where a local vtable would help:

struct __declspec(dllimport) A { virtual void a(); };
struct __declspec(dllimport) B { virtual void b(); };
struct __declspec(dllimport) C : A, B {
  void a() override;
  void b() override;
};
constexpr C c;

I'm suggesting we can make a DSO local copy of _ZTV1C, so that we can refer to it from c. It's OK if the vtable refers to the imported thunks, since the address of virtual methods isn't user visible.

Either that, or we need to make something like this change here, right?

Ok, I see. For this case I guess it should be mostly straightforward to just tweak a few decisions in ItaniumCXXABI to choose emitting new vtables as linkonce_odr - that would sound like a nice solution.

But for the case of __cxxabiv1::__class_type_info, there's no declaration visible in scope at all, and the actual contents of the vtable of this class (and the other similar classes) doesn't seem to be defined in the ABI (it seems to differ significantly between libc++abi and libsupc++). Or do you have any idea of how to work around that as well? We could emit an empty local vtable and try to initialize that at runtime by copying from the dllimported real vtable - but I guess we don't even know the size of it, when its members are defined by the c++abi implementation.

Ping @rnk - let me reiterate the questions that are open:

In D43184#1009355, @rnk wrote:

Here's a case where a local vtable would help:

struct __declspec(dllimport) A { virtual void a(); };
struct __declspec(dllimport) B { virtual void b(); };
struct __declspec(dllimport) C : A, B {
  void a() override;
  void b() override;
};
constexpr C c;

I'm suggesting we can make a DSO local copy of _ZTV1C, so that we can refer to it from c. It's OK if the vtable refers to the imported thunks, since the address of virtual methods isn't user visible.

Either that, or we need to make something like this change here, right?

Ok, I see - I think this should be doable the way you suggest, but I haven't done any further work on that yet. However I unfortunately don't think this can solve the class_type_info case at all:

For __cxxabiv1::__class_type_info, there's no declaration visible in scope at all, and the actual contents of the vtable of this class (and the other similar classes) doesn't seem to be defined in the ABI (it seems to differ significantly between libc++abi and libsupc++). Or do you have any idea of how to work around that as well? We could emit an empty local vtable and try to initialize that at runtime by copying from the dllimported real vtable - but I guess we don't even know the size of it, when its members are defined by the c++abi implementation.

So in case this approach as my hacky PoC does is the only feasible way left, I obviously need to fix it up. Any directions and suggestions on how to structure it properly? And suggestions for the user visible option, etc?

FYI, binutils auto-import actually creates a fake IAT entry rather than using a dynamic initializer. I think it's actually a pretty cute trick. http://blog.omega-prime.co.uk/2011/07/04/everything-you-never-wanted-to-know-about-dlls/#how-auto-import-works has details. That only works when you're referencing an external imported symbol directly though, and breaks when you need an offset from the imported symbol.

Actually, it feels like that blogpost is rather outdated; I can't manage to make current binutils produce this form of fixups for the issue at all. What it does however, is create a runtime pseudo relocation; the linker generates extra info in the rdata section about the relocations it wasn't able to handle, which the mingw crt will fix up on startup before handing control over to the end user code. This can involve changing the protection of the code sections to writeable in order to do such fixups there. (This feature has been in binutils since 2002.)

rnk added a comment.Mar 14 2018, 4:08 PM

For __cxxabiv1::__class_type_info, there's no declaration visible in scope at all, and the actual contents of the vtable of this class (and the other similar classes) doesn't seem to be defined in the ABI (it seems to differ significantly between libc++abi and libsupc++). Or do you have any idea of how to work around that as well? We could emit an empty local vtable and try to initialize that at runtime by copying from the dllimported real vtable - but I guess we don't even know the size of it, when its members are defined by the c++abi implementation.

I see, the field layout of the RTTI is defined by the ABI, but the vtable layout is an implementation detail of the C++ ABI runtime.

So in case this approach as my hacky PoC does is the only feasible way left, I obviously need to fix it up. Any directions and suggestions on how to structure it properly? And suggestions for the user visible option, etc?

Well, I guess it's the equivalent of /MD /MT, i.e. is your CRT static or dynamic. How is this expressed to GCC?

Actually, it feels like that blogpost is rather outdated; I can't manage to make current binutils produce this form of fixups for the issue at all. What it does however, is create a runtime pseudo relocation; the linker generates extra info in the rdata section about the relocations it wasn't able to handle, which the mingw crt will fix up on startup before handing control over to the end user code. This can involve changing the protection of the code sections to writeable in order to do such fixups there. (This feature has been in binutils since 2002.)

So basically mingw has a custom relocation format layered on top of PE. I mean, that would certainly help code size. Otherwise, every TU that emits a vtable with RTTI is going to need a dynamic initializer... that would be bad for startup time, unless we have clever ways to dedupe them across TUs.

Maybe the cleaner thing to do is to actually handle this as a relocation. I tried to figure out what GCC is doing, but it doesn't seem to work at all.

lib/CodeGen/ItaniumCXXABI.cpp
3218

I was surprised to discover that this probably works. I'd forgotten that mingw rolls its own .ctors section instead of reusing some of the .CRT$XCU machinery. This just makes me feel like we're layering the levels of paper deeper, and we might be better off implementing the mingw relocation thingy you mentioned.

In D43184#1038129, @rnk wrote:

So in case this approach as my hacky PoC does is the only feasible way left, I obviously need to fix it up. Any directions and suggestions on how to structure it properly? And suggestions for the user visible option, etc?

Well, I guess it's the equivalent of /MD /MT, i.e. is your CRT static or dynamic. How is this expressed to GCC?

I don't think they really have any equivalent; on mingw, the CRT itself can only be linked dynamically, and for things like libstdc++, it follows the -static flag (i.e. normally looks for libs named libX.dll.a, and if not found or if -static is specified, looks for libX.a). But -static is a linker flag only so it doesn't help for deciding what to emit at compile time.

Actually, it feels like that blogpost is rather outdated; I can't manage to make current binutils produce this form of fixups for the issue at all. What it does however, is create a runtime pseudo relocation; the linker generates extra info in the rdata section about the relocations it wasn't able to handle, which the mingw crt will fix up on startup before handing control over to the end user code. This can involve changing the protection of the code sections to writeable in order to do such fixups there. (This feature has been in binutils since 2002.)

So basically mingw has a custom relocation format layered on top of PE. I mean, that would certainly help code size. Otherwise, every TU that emits a vtable with RTTI is going to need a dynamic initializer... that would be bad for startup time, unless we have clever ways to dedupe them across TUs.

Maybe the cleaner thing to do is to actually handle this as a relocation. I tried to figure out what GCC is doing, but it doesn't seem to work at all.

This was my conclusion at some point as well - but there's also a few issues with that approach which makes me quite hesitant:

  • doing fixups in the code segment

(Accidentally pressed submit too soon)

This was my conclusion at some point as well - but there's also a few issues with that approach which makes me quite hesitant:

  • doing fixups in the code segment requires making it writable temporarily at runtime, which is disallowed in UWP
  • the current set of runtime relocations only allows adding addresses in 1, 2, 4 and 8 byte sized addresses. On arm/arm64, absolute addresses can be present in instructions with a much more complicated opcode format, and tweaking addresses there requires defining more runtime relocation formats.

So given all this - I only see bad options. I could just postpone this as well - I can live with only linking libc++ statically for now.

The other case you mentioned, with multiple inheritance for statically allocated objects, needs to be handled in one way or another though (unless one forbids C++ interfaces over DLL boundaries); how's that handled with the MSVC C++ ABI?

rnk added a comment.Mar 15 2018, 11:55 AM

(Accidentally pressed submit too soon)

This was my conclusion at some point as well - but there's also a few issues with that approach which makes me quite hesitant:

  • doing fixups in the code segment requires making it writable temporarily at runtime, which is disallowed in UWP
  • the current set of runtime relocations only allows adding addresses in 1, 2, 4 and 8 byte sized addresses. On arm/arm64, absolute addresses can be present in instructions with a much more complicated opcode format, and tweaking addresses there requires defining more runtime relocation formats.

I wouldn't consider relocating non-dllimport references to dllimport symbols as being in scope. As long the symbol is annotated as dllimport, the compiler will generate code that does a load from the __imp_ pointer in the IAT.

For unannotated symbols, the linker already generates thunks for function references, so that just leaves data imports.

Still, the RTTI will be in the .rdata section, so it will require unprotecting readonly memory, which won't work in UWP. We could move it to .data, I guess.

So given all this - I only see bad options. I could just postpone this as well - I can live with only linking libc++ statically for now.

Might be the thing to do.

The other case you mentioned, with multiple inheritance for statically allocated objects, needs to be handled in one way or another though (unless one forbids C++ interfaces over DLL boundaries); how's that handled with the MSVC C++ ABI?

I think local vftables handle it.

In D43184#1039278, @rnk wrote:

(Accidentally pressed submit too soon)

This was my conclusion at some point as well - but there's also a few issues with that approach which makes me quite hesitant:

  • doing fixups in the code segment requires making it writable temporarily at runtime, which is disallowed in UWP
  • the current set of runtime relocations only allows adding addresses in 1, 2, 4 and 8 byte sized addresses. On arm/arm64, absolute addresses can be present in instructions with a much more complicated opcode format, and tweaking addresses there requires defining more runtime relocation formats.

I wouldn't consider relocating non-dllimport references to dllimport symbols as being in scope. As long the symbol is annotated as dllimport, the compiler will generate code that does a load from the __imp_ pointer in the IAT.

For unannotated symbols, the linker already generates thunks for function references, so that just leaves data imports.

Yes - it's only unannotated data imports that is the issue.

Still, the RTTI will be in the .rdata section, so it will require unprotecting readonly memory, which won't work in UWP. We could move it to .data, I guess.

True, that would avoid the really bad cases - and since a reference to that isn't embedded in instructions in the text segments, the existing plain pointer fixups should suffice.

So given all this - I only see bad options. I could just postpone this as well - I can live with only linking libc++ statically for now.

Might be the thing to do.

Ok - I'll put it on the backburner, and maybe see if I'd try implementing the linker part of fixing unannotated data imports at some point.

The other case you mentioned, with multiple inheritance for statically allocated objects, needs to be handled in one way or another though (unless one forbids C++ interfaces over DLL boundaries); how's that handled with the MSVC C++ ABI?

I think local vftables handle it.

Ok, thanks.

rnk added a comment.Mar 15 2018, 1:18 PM

Ok - I'll put it on the backburner, and maybe see if I'd try implementing the linker part of fixing unannotated data imports at some point.

I'm not sure that's feasible. At least for x86, global addresses can be folded in ways that mean the linker cannot relax them. You can go the other way, though. If we start by assuming all data is imported, you can often relax an __imp_sym load to a LEA sym(%rip). This increases register pressure and generates worse code, but it has conceivable use cases.

In D43184#1039385, @rnk wrote:

Ok - I'll put it on the backburner, and maybe see if I'd try implementing the linker part of fixing unannotated data imports at some point.

I'm not sure that's feasible.

If we'd just fix the RTTI vtable case, this still would need to be implemented in the linker, right? That's what I meant originally - implementing enough of that to handle RTTI, optionally giving the larger usecase a shot.

At least for x86, global addresses can be folded in ways that mean the linker cannot relax them. You can go the other way, though. If we start by assuming all data is imported, you can often relax an __imp_sym load to a LEA sym(%rip). This increases register pressure and generates worse code, but it has conceivable use cases.

Hmm, I think it seems like GCC does something like that. When accessing data symbols that aren't known to be DSO local, it seems to produce code like this: "movq .refptr.a(%rip), %rax; mov (%rax), %rax". My x86-fu is rather weak but I guess that's pretty much what you meant?

Anyway, good to know about that case.

mstorsjo abandoned this revision.Aug 27 2018, 8:20 AM

This isn't needed now any longer, when D50917 has landed.