In many cases we can't devirtualize
because definition of vtable is not present. Most of the
time it is caused by inline virtual function not beeing
emitted. Forcing emitting of vtable adds a reference of these
inline virtual functions.
Note that GCC was always doing it.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 18378 Build 18378: arc lint + arc unit
Event Timeline
This is not sound: sometimes the forcefully emitted vtable is incorrect due to destructor aliasing. This happens e.g. in the Bullet benchmark from the llvm test suite. A simplified example follows.
For a translation unit:
struct Base { virtual int foo() { return 42; } virtual ~Base() { } }; struct Derived : Base { Derived(); }; int foo() { Derived *der = new Derived(); return der->foo(); }
it emits:
; Function Attrs: nounwind declare dso_local void @_ZN7DerivedD1Ev(%struct.Derived*) unnamed_addr #6 ; Function Attrs: nounwind declare dso_local void @_ZN7DerivedD0Ev(%struct.Derived*) unnamed_addr #6 @_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI7Derived to i8*), i8* bitcast (i32 (%struct.Base*)* @_ZN4Base3fooEv to i8*), i8* bitcast (void (%struct.Derived*)* @_ZN7DerivedD1Ev to i8*), i8* bitcast (void (%struct.Derived*)* @_ZN7DerivedD0Ev to i8*)] }, comdat, align 8
whereas for TU:
struct Base { virtual int foo() { return 42; } virtual ~Base() { } }; struct Derived : Base { Derived(); }; Derived::Derived() { }
which emits the whole vtable (even without this patch):
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI7Derived to i8*), i8* bitcast (i32 (%struct.Base*)* @_ZN4Base3fooEv to i8*), i8* bitcast (void (%struct.Base*)* @_ZN4BaseD2Ev to i8*), i8* bitcast (void (%struct.Derived*)* @_ZN7DerivedD0Ev to i8*)] }, comdat, align 8
Please note that the complete object destructor @_ZN7DerivedD1Ev has been RAUWed with the base object destructor @_ZN4BaseD2Ev.
Since this triggers the RAUW case (as opposed to the Alias case) in ItaniumCXXABI::emitCXXStructor, the complete object destructor is not emitted at all.
Because Derived lacks a key function, the vtable is linkonce_odr and any of those definitions can be picked. Hence, this code can stop linking at the whim of the linker. :(
I think that MarkVTableUsed should be called somewhere in Sema (possibly ActOnFinishCXXMemberDecls?) if ForceEmitVTables is on. This probably requires making ForceEmitVTables a LangOption in addition to it being a CodeGenOption.
This causes the above example to compile correctly. Moreover, I have checked that 7zip and Bullet (the llvm test suite benchmarks which failed previously) build and execute correctly with the above change.
I thought we already had places in Sema that marked inline virtual methods as used, instantiated templates, etc. for devirtualization purposes when optimization was enabled. Did we rip that out?
The problem we've had over and over with devirtualization is that we have to emit a perfect v-table because LLVM lacks a lot of the key vocabulary for talking about incomplete information. For example, if something weird happens and we don't have a definition for an inline virtual method, ideally we'd just say "well, you can't devirtualize this slot", then try to fix that as incremental progress; but instead we have to get everything just right or else disable the whole optimization. Note that vague-linkage v-tables mean that we'd also need to be able to say things like "there is an object with a definition that looks like this, but its symbol is not available and you can't emit it yourself".
I only recall the emitting available_externally vtables opportunistically, that is emitting it only if all the inline virtual functions are present (and they are not hidden).
(https://reviews.llvm.org/D33437)
The problem we've had over and over with devirtualization is that we have to emit a perfect v-table because LLVM lacks a lot of the key vocabulary for talking about incomplete information. For example, if something weird happens and we don't have a definition for an inline virtual method, ideally we'd just say "well, you can't devirtualize this slot", then try to fix that as incremental progress; but instead we have to get everything just right or else disable the whole optimization. Note that vague-linkage v-tables mean that we'd also need to be able to say things like "there is an object with a definition that looks like this, but its symbol is not available and you can't emit it yourself".
That is correcty, my intention was that this flag would cause all inline virtual functions to be emitted. Can you give a hint how to achieve this in the sane way?
Well, this paragraph was really a call for extending LLVM IR, which might be outside a reasonable scope for your summer project.
Building a v-table should itself cause inline functions to be emitted in IRGen as long as they've been properly marked for instantiation and use by Sema. I don't know the right place to ensure that that happens.
clang/test/CodeGenCXX/vtable-available-externally.cpp | ||
---|---|---|
445 | This still uses _ZN6Test187DerivedD1Ev instead of _ZN6Test184BaseD2Ev, because the triple you selected is x86_64-apple-darwin10 and -mconstructor-aliases is off by default on Darwin. I suggest you either add -mconstructor-aliases to the compiler invocation or change the triple to literally everything else than Darwin (or CUDA). Preferably the former, as it shows clearly what we want to test here. |
Looks good to me. Obviously, you should wait for someone more competent than me to accept it, too.
I would like to see a test that virtual function definitions are properly emitted for classes instantiated from templates (and in particular, that this triggers instantiation of such virtual member functions so they can be emitted). Other than that, this looks good to me.
LGTM with one change.
clang/include/clang/Basic/LangOptions.def | ||
---|---|---|
292 ↗ | (On Diff #150735) | I don't think this is benign or even compatible: if we compile a PCH with this turned off, and then try to emit code using that PCH, we can miscompile due to having not marked the contents of a vtable as used. Please change this from BENIGN_LANGOPT to just LANGOPT. |
This still uses _ZN6Test187DerivedD1Ev instead of _ZN6Test184BaseD2Ev, because the triple you selected is x86_64-apple-darwin10 and -mconstructor-aliases is off by default on Darwin.
I suggest you either add -mconstructor-aliases to the compiler invocation or change the triple to literally everything else than Darwin (or CUDA). Preferably the former, as it shows clearly what we want to test here.