This is an archive of the discontinued LLVM Phabricator instance.

Optimize emission of `dynamic_cast` to final classes.
ClosedPublic

Authored by rsmith on Jul 6 2023, 2:58 PM.

Details

Summary
  • When the destination is a final class type that does not derive from the source type, the cast always fails and is now emitted as a null pointer or call to __cxa_bad_cast.
  • When the destination is a final class type that does derive from the source type, emit a direct comparison against the corresponding base class vptr value(s). There may be more than one such value in the case of multiple inheritance; check them all.

For now, this is supported only for the Itanium ABI. I expect the same thing is
possible for the MS ABI too, but I don't know what guarantees are made about
vfptr uniqueness.

Diff Detail

Event Timeline

rsmith created this revision.Jul 6 2023, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 2:58 PM
Herald added subscribers: nlopes, mgrang. · View Herald Transcript
rsmith requested review of this revision.Jul 6 2023, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 2:58 PM
nlopes added inline comments.Jul 6 2023, 3:04 PM
clang/lib/CodeGen/ItaniumCXXABI.cpp
1560

Please use PoisonValue as a placeholder whenever possible. We are trying to get rid of undef.
Thank you!

rjmccall added a comment.EditedJul 6 2023, 5:38 PM

I don't think it's an intended guarantee of the Itanium ABI that the v-table will be unique, and v-tables are frequently not unique in the presence of shared libraries. They should be unique for classes with internal linkage, but of course that's a vastly reduced domain for the optimization. We could do an RTTI comparison in the general case, either by hardcoding the algorithm inline or by synthesizing a helper expression in Sema to compare two std::type_infos with ==. Or you could have a flag to tell you to make stronger uniqueness assumptions.

If there are multiple subobjects of the source type in the destination type, consider just casting to void* first instead of doing multiple comparisons.

rsmith added a comment.Jul 6 2023, 6:10 PM

I don't think it's an intended guarantee of the Itanium ABI that the v-table will be unique, and v-tables are frequently not unique in the presence of shared libraries.

https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vtable-general explicitly guarantees this: "[...] However, the virtual table pointers within all the objects (instances) of a particular most-derived class point to the same set of virtual tables."

They should be unique for classes with internal linkage, but of course that's a vastly reduced domain for the optimization.

I think (hope?) we should be able to apply this to a much larger set of cases. Would it be correct to do this optimization unless the vtable might be emitted with vague linkage and non-default visibility (that is, unless we're in the odd case where people expect non-default visibility classes to be the same type across DSOs)? Or are there cases where we might be using a vtable that (eg) doesn't even have the right symbol?

If there are multiple subobjects of the source type in the destination type, consider just casting to void* first instead of doing multiple comparisons.

Ha, that seems obvious in retrospect :) Will do.

I don't think it's an intended guarantee of the Itanium ABI that the v-table will be unique, and v-tables are frequently not unique in the presence of shared libraries.

https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vtable-general explicitly guarantees this: "[...] However, the virtual table pointers within all the objects (instances) of a particular most-derived class point to the same set of virtual tables."

Huh, I've never noticed that before. I still don't think it's satisfied in practice, at least in the presence of shared libraries. This is notably the exact same setup that eventually forced (or "forced") GCC to start using string comparison for RTTI instead of the ABI-prescribed algorithm that relies on _ZTS pointer equality.

They should be unique for classes with internal linkage, but of course that's a vastly reduced domain for the optimization.

I think (hope?) we should be able to apply this to a much larger set of cases. Would it be correct to do this optimization unless the vtable might be emitted with vague linkage and non-default visibility (that is, unless we're in the odd case where people expect non-default visibility classes to be the same type across DSOs)? Or are there cases where we might be using a vtable that (eg) doesn't even have the right symbol?

I don't know of any problems other than the total failure of vague linkage across DSO boundaries, so if we just treat that as an implicit exception to the vtable uniqueness guarantee in the ABI, I think you've got the condition exactly right: we could do this for any class where either the v-table doesn't have vague linkage or the class's formal visibility is not default. Our experience at Apple with enforcing type visibility is that it's usually good for one or two bug reports a year, but it's pretty easy to explain to users what they did wrong, and Clang's visibility attributes are pretty simple to use. (type_visibility helps a lot with managing tradeoffs.)

rsmith updated this revision to Diff 538281.Jul 7 2023, 3:15 PM
  • Address review comments.
rsmith marked an inline comment as done.Jul 7 2023, 3:28 PM

I think (hope?) we should be able to apply this to a much larger set of cases. Would it be correct to do this optimization unless the vtable might be emitted with vague linkage and non-default visibility (that is, unless we're in the odd case where people expect non-default visibility classes to be the same type across DSOs)? Or are there cases where we might be using a vtable that (eg) doesn't even have the right symbol?

I don't know of any problems other than the total failure of vague linkage across DSO boundaries, so if we just treat that as an implicit exception to the vtable uniqueness guarantee in the ABI, I think you've got the condition exactly right: we could do this for any class where either the v-table doesn't have vague linkage or the class's formal visibility is not default. Our experience at Apple with enforcing type visibility is that it's usually good for one or two bug reports a year, but it's pretty easy to explain to users what they did wrong, and Clang's visibility attributes are pretty simple to use. (type_visibility helps a lot with managing tradeoffs.)

I did some checking through the Clang implementation and found another two cases:

  • under -fapple-kext, vague-linkage vtables get emitted in each translation unit that references them
  • under -fno-rtti, identical vtables for distinct types could get merged because we emit vtables as unnamed_addr (this doesn't affect dynamic_cast, because -fno-rtti also disables dynamic_cast entirely)

The good news seems to be that if you don't use any language extensions (type visibility, -fno-rtti, -fapple-kext), then we do actually provide the guarantee that the ABI describes. :)

If there are multiple subobjects of the source type in the destination type, consider just casting to void* first instead of doing multiple comparisons.

This turned out to be a little subtle: the vptr comparison we do after the cast requires loading a vptr of an object of entirely-unknown type. This is the first time we're doing that in IR generation, and GetVTablePtr expected to be told which class it's loading the vtable for (presumably, so that it can load from the right offset within the class). However, the implementation of GetVTablePtr didn't use the class for anything; it's always at the start for all of our supported ABIs. But this patch would make it harder to support an ABI that didn't put the vptr at the start of the most-derived object.

rsmith updated this revision to Diff 538283.Jul 7 2023, 3:38 PM
  • Mark gep as inbounds.

I think (hope?) we should be able to apply this to a much larger set of cases. Would it be correct to do this optimization unless the vtable might be emitted with vague linkage and non-default visibility (that is, unless we're in the odd case where people expect non-default visibility classes to be the same type across DSOs)? Or are there cases where we might be using a vtable that (eg) doesn't even have the right symbol?

I don't know of any problems other than the total failure of vague linkage across DSO boundaries, so if we just treat that as an implicit exception to the vtable uniqueness guarantee in the ABI, I think you've got the condition exactly right: we could do this for any class where either the v-table doesn't have vague linkage or the class's formal visibility is not default. Our experience at Apple with enforcing type visibility is that it's usually good for one or two bug reports a year, but it's pretty easy to explain to users what they did wrong, and Clang's visibility attributes are pretty simple to use. (type_visibility helps a lot with managing tradeoffs.)

I did some checking through the Clang implementation and found another two cases:

  • under -fapple-kext, vague-linkage vtables get emitted in each translation unit that references them
  • under -fno-rtti, identical vtables for distinct types could get merged because we emit vtables as unnamed_addr (this doesn't affect dynamic_cast, because -fno-rtti also disables dynamic_cast entirely)

The good news seems to be that if you don't use any language extensions (type visibility, -fno-rtti, -fapple-kext), then we do actually provide the guarantee that the ABI describes. :)

If there are multiple subobjects of the source type in the destination type, consider just casting to void* first instead of doing multiple comparisons.

This turned out to be a little subtle: the vptr comparison we do after the cast requires loading a vptr of an object of entirely-unknown type. This is the first time we're doing that in IR generation, and GetVTablePtr expected to be told which class it's loading the vtable for (presumably, so that it can load from the right offset within the class). However, the implementation of GetVTablePtr didn't use the class for anything; it's always at the start for all of our supported ABIs. But this patch would make it harder to support an ABI that didn't put the vptr at the start of the most-derived object.

Oh, this does matter on platforms using pointer authentication, because each vptr field is signed with a constant discriminator derived from the name of the class that introduced it.

clang/lib/CodeGen/ItaniumCXXABI.cpp
204

Is there no reasonable way of checking this without actually triggering the emission of RTTI?

rsmith updated this revision to Diff 539776.Jul 12 2023, 3:45 PM
rsmith marked an inline comment as done.
  • Avoid emitting the type_info when detecting whether it would be null.
  • Bring back class type in GetVTablePtr and instead don't use it here.

Oh, this does matter on platforms using pointer authentication, because each vptr field is signed with a constant discriminator derived from the name of the class that introduced it.

Oh, hm. I guess you have an out-of-tree patch for that? I've switched back to passing types into GetVTablePtr, and stopped using that from emitExactDynamicCast in favor of directly loading the vtpr after the dynamic_cast<void*>. You would need to disable the exact dynamic_cast optimization when pointer authentication is enabled for vptrs, but that should be a small patch.

LGTM, except, should we have a way to turn this optimization off specifically?

rsmith updated this revision to Diff 542693.Jul 20 2023, 2:55 PM
  • Add option to disable vptr-based dynamic_cast optimization.

LGTM, except, should we have a way to turn this optimization off specifically?

Sure. I suppose even for an internal linkage vtable there could be a reason to want to turn this off (eg, if someone constructs a custom vtable at runtime). I've added -fno-assume-unique-vtables to disable the optimization.

rjmccall accepted this revision.Jul 21 2023, 3:13 PM

Thanks, LGTM

This revision is now accepted and ready to land.Jul 21 2023, 3:13 PM
This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Jul 25 2023, 3:55 PM

This change results in some of our builds (distributed Thin-LTO in case that matters) to fail with missing symbols. At a first glance this seems to emit VTables in some files where it didn't do this before and then fails to resolve some members of that vtable. I'm in the process of analyzing this further and making a small reproducer.

This change results in some of our builds (distributed Thin-LTO in case that matters) to fail with missing symbols. At a first glance this seems to emit VTables in some files where it didn't do this before and then fails to resolve some members of that vtable. I'm in the process of analyzing this further and making a small reproducer.

Please try again after rGb6847edfc235829b37dd6d734ef5bbfa0a58b6fc and see if you're still having issues.

MatzeB added a comment.EditedJul 25 2023, 4:03 PM

This change results in some of our builds (distributed Thin-LTO in case that matters) to fail with missing symbols. At a first glance this seems to emit VTables in some files where it didn't do this before and then fails to resolve some members of that vtable. I'm in the process of analyzing this further and making a small reproducer.

I just noticed b6847edfc235829b37dd6d734ef5bbfa0a58b6fc mentioned in the task above which isn't part of our builds yet. I see even more symbols emitted with that change and suspect it may resolves the undefined symbols problems then. Will know for sure in a day or two (and if you hear nothing all is working well).