[CodeGenCXX] Add -fforce-emit-vtables
ClosedPublic

Authored by Prazek on Sat, May 19, 11:58 AM.

Details

Summary

In many cases we can't devirtualize
because definition of vtable is not present. Most of the
time it is caused by inline virtual function not beeing
emitted. Forcing emitting of vtable adds a reference of these
inline virtual functions.
Note that GCC was always doing it.

Diff Detail

Repository
rC Clang
Prazek created this revision.Sat, May 19, 11:58 AM
Prazek updated this revision to Diff 147672.Sat, May 19, 12:03 PM

Add release note

Prazek updated this revision to Diff 147673.Sat, May 19, 12:04 PM

remove empty line

Prazek updated this revision to Diff 147675.Sat, May 19, 12:16 PM

Fixed comment

Prazek updated this revision to Diff 147679.Sat, May 19, 1:08 PM

Fixed flag passing

Prazek retitled this revision from Add -fforce-emit-vtable to Add -fforce-emit-vtables.Sat, May 19, 1:08 PM
Prazek edited the summary of this revision. (Show Details)
amharc requested changes to this revision.Mon, May 21, 5:12 AM

This is not sound: sometimes the forcefully emitted vtable is incorrect due to destructor aliasing. This happens e.g. in the Bullet benchmark from the llvm test suite. A simplified example follows.

For a translation unit:

struct Base {
  virtual int foo() { return 42; }
  virtual ~Base() { }
};

struct Derived : Base {
  Derived();
};

int foo() {
  Derived *der = new Derived();
  return der->foo();
}

it emits:

; Function Attrs: nounwind
declare dso_local void @_ZN7DerivedD1Ev(%struct.Derived*) unnamed_addr #6

; Function Attrs: nounwind
declare dso_local void @_ZN7DerivedD0Ev(%struct.Derived*) unnamed_addr #6

@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI7Derived to i8*), i8* bitcast (i32 (%struct.Base*)* @_ZN4Base3fooEv to i8*), i8* bitcast (void (%struct.Derived*)* @_ZN7DerivedD1Ev to i8*), i8* bitcast (void (%struct.Derived*)* @_ZN7DerivedD0Ev to i8*)] }, comdat, align 8

whereas for TU:

struct Base {
  virtual int foo() { return 42; }
  virtual ~Base() { }
};

struct Derived : Base {
  Derived();
};

Derived::Derived() { }

which emits the whole vtable (even without this patch):

@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI7Derived to i8*), i8* bitcast (i32 (%struct.Base*)* @_ZN4Base3fooEv to i8*), i8* bitcast (void (%struct.Base*)* @_ZN4BaseD2Ev to i8*), i8* bitcast (void (%struct.Derived*)* @_ZN7DerivedD0Ev to i8*)] }, comdat, align 8

Please note that the complete object destructor @_ZN7DerivedD1Ev has been RAUWed with the base object destructor @_ZN4BaseD2Ev.
Since this triggers the RAUW case (as opposed to the Alias case) in ItaniumCXXABI::emitCXXStructor, the complete object destructor is not emitted at all.

Because Derived lacks a key function, the vtable is linkonce_odr and any of those definitions can be picked. Hence, this code can stop linking at the whim of the linker. :(

This revision now requires changes to proceed.Mon, May 21, 5:12 AM

I think that MarkVTableUsed should be called somewhere in Sema (possibly ActOnFinishCXXMemberDecls?) if ForceEmitVTables is on. This probably requires making ForceEmitVTables a LangOption in addition to it being a CodeGenOption.

This causes the above example to compile correctly. Moreover, I have checked that 7zip and Bullet (the llvm test suite benchmarks which failed previously) build and execute correctly with the above change.

Prazek retitled this revision from Add -fforce-emit-vtables to [CodeGenCXX] Add -fforce-emit-vtables.Tue, May 22, 5:42 AM

@rjmccall do you have any thoughts on the best way to solve it?

I thought we already had places in Sema that marked inline virtual methods as used, instantiated templates, etc. for devirtualization purposes when optimization was enabled. Did we rip that out?

The problem we've had over and over with devirtualization is that we have to emit a perfect v-table because LLVM lacks a lot of the key vocabulary for talking about incomplete information. For example, if something weird happens and we don't have a definition for an inline virtual method, ideally we'd just say "well, you can't devirtualize this slot", then try to fix that as incremental progress; but instead we have to get everything just right or else disable the whole optimization. Note that vague-linkage v-tables mean that we'd also need to be able to say things like "there is an object with a definition that looks like this, but its symbol is not available and you can't emit it yourself".

I thought we already had places in Sema that marked inline virtual methods as used, instantiated templates, etc. for devirtualization purposes when optimization was enabled. Did we rip that out?

I only recall the emitting available_externally vtables opportunistically, that is emitting it only if all the inline virtual functions are present (and they are not hidden).
(https://reviews.llvm.org/D33437)

The problem we've had over and over with devirtualization is that we have to emit a perfect v-table because LLVM lacks a lot of the key vocabulary for talking about incomplete information. For example, if something weird happens and we don't have a definition for an inline virtual method, ideally we'd just say "well, you can't devirtualize this slot", then try to fix that as incremental progress; but instead we have to get everything just right or else disable the whole optimization. Note that vague-linkage v-tables mean that we'd also need to be able to say things like "there is an object with a definition that looks like this, but its symbol is not available and you can't emit it yourself".

That is correcty, my intention was that this flag would cause all inline virtual functions to be emitted. Can you give a hint how to achieve this in the sane way?

I thought we already had places in Sema that marked inline virtual methods as used, instantiated templates, etc. for devirtualization purposes when optimization was enabled. Did we rip that out?

I only recall the emitting available_externally vtables opportunistically, that is emitting it only if all the inline virtual functions are present (and they are not hidden).
(https://reviews.llvm.org/D33437)

The problem we've had over and over with devirtualization is that we have to emit a perfect v-table because LLVM lacks a lot of the key vocabulary for talking about incomplete information. For example, if something weird happens and we don't have a definition for an inline virtual method, ideally we'd just say "well, you can't devirtualize this slot", then try to fix that as incremental progress; but instead we have to get everything just right or else disable the whole optimization. Note that vague-linkage v-tables mean that we'd also need to be able to say things like "there is an object with a definition that looks like this, but its symbol is not available and you can't emit it yourself".

That is correcty, my intention was that this flag would cause all inline virtual functions to be emitted. Can you give a hint how to achieve this in the sane way?

Well, this paragraph was really a call for extending LLVM IR, which might be outside a reasonable scope for your summer project.

Building a v-table should itself cause inline functions to be emitted in IRGen as long as they've been properly marked for instantiation and use by Sema. I don't know the right place to ensure that that happens.

Prazek updated this revision to Diff 148762.Sun, May 27, 11:16 AM

Fixed missing vtable commponents

Prazek updated this revision to Diff 148764.Sun, May 27, 11:38 AM

small update

amharc requested changes to this revision.Mon, May 28, 4:04 AM
amharc added inline comments.
clang/test/CodeGenCXX/vtable-available-externally.cpp
445 ↗(On Diff #148764)

This still uses _ZN6Test187DerivedD1Ev instead of _ZN6Test184BaseD2Ev, because the triple you selected is x86_64-apple-darwin10 and -mconstructor-aliases is off by default on Darwin.

I suggest you either add -mconstructor-aliases to the compiler invocation or change the triple to literally everything else than Darwin (or CUDA). Preferably the former, as it shows clearly what we want to test here.

This revision now requires changes to proceed.Mon, May 28, 4:04 AM
Prazek updated this revision to Diff 148839.Mon, May 28, 2:46 PM
Prazek marked an inline comment as done.

fixed test

amharc accepted this revision.Tue, May 29, 12:23 AM

Looks good to me. Obviously, you should wait for someone more competent than me to accept it, too.

This revision is now accepted and ready to land.Tue, May 29, 12:23 AM
rsmith accepted this revision.Mon, Jun 4, 12:38 AM

I would like to see a test that virtual function definitions are properly emitted for classes instantiated from templates (and in particular, that this triggers instantiation of such virtual member functions so they can be emitted). Other than that, this looks good to me.

Prazek updated this revision to Diff 150735.Mon, Jun 11, 6:24 AM
  • Fixed templates
rsmith accepted this revision.Tue, Jun 12, 4:09 PM

LGTM with one change.

clang/include/clang/Basic/LangOptions.def
292 ↗(On Diff #150735)

I don't think this is benign or even compatible: if we compile a PCH with this turned off, and then try to emit code using that PCH, we can miscompile due to having not marked the contents of a vtable as used. Please change this from BENIGN_LANGOPT to just LANGOPT.

Closed by commit rC334600: Add -fforce-emit-vtables (authored by Prazek, committed by ). · Explain WhyWed, Jun 13, 7:00 AM
This revision was automatically updated to reflect the committed changes.
Prazek marked an inline comment as done.