This is an archive of the discontinued LLVM Phabricator instance.

dynamic_cast optimization in LTO
Needs ReviewPublic

Authored by w2yehia on Jul 30 2020, 8:38 PM.

Details

Summary

http://lists.llvm.org/pipermail/llvm-dev/2020-April/140668.html

What is the optimization (TL;DR version):
The transformation tries to convert a __dynamic_cast function call, into an address comparison and VFT-like lookup, when the following conditions are met:

  1. the destination type is a leaf type, i.e. is never derived from (similar to C++ final semantics) in the entire program.
  2. the static type of the expression being casted is a public base (potentially multi-base and never private) class of the destination type.

Example:

Given the C++ expression:
   NULL != dynamic_cast<A*>(ptr)   // where B* ptr;
which coming out of clang would look like so:
  NULL ! = __dynamic_cast(ptr,
                          &_ZTI1B, // typeinfo of B, the static type of ptr.
                          &_ZTI1A, // typeinfo of A, the destination type.
                          hint)    // a static hint about the location of the source subobject w.r.t the complete object.

If the above conditions can be proven to be true, then an equivalent expression is:

(destType == dynamicType) where: std::typeinfo *destType = &_ZTI1A;
                                 std::typeinfo *dynamicType = ((void**)ptr)[-1];

Detailed description:
A C++ dynamic_cast<A*>(ptr) expression can either

  1. be folded by the frontend into a static_cast<A*>(ptr), or
  2. converted to a runtime call to __dynamic_cast if the frontend does not have enough information (which is the common case for dynamic_cast).

The crux of the transformation is trying to prove that a type is a leaf.
We utilize the !type metadata (https://llvm.org/docs/TypeMetadata.html) that is attached to the virtual function table (VFT) globals to answer this question.
For each VFT, the !type MD lists the other VFTs that are "compatible" with it. In general, the VFT of a class B is considered to be "compatible" with the VFT of a class A, iff A derives (publicly or privately) from B.
This means that the VFT of a leaf class type is never compatible with any other VFT, and we use this fact to decide which type is a leaf.
The second fact that we need to prove is the accessibility of the base type in the derived object.
Unfortunately we couldn't find a way to compute this information from the existing IR, and had to introduce a custom attribute that the Frontend would place on the __dynamic_cast call. The presence of the attribute implies that the static type (B in our example) is a public base class and never a private base class (in case there are multiple subobjects of the static_type inside the complete object) of the destination type (A in our example). Hence, if the attribute gets deleted by some pass, our transformation will simply do nothing for that __dynamic_cast call.

Diff Detail

Event Timeline

w2yehia created this revision.Jul 30 2020, 8:38 PM
w2yehia requested review of this revision.Jul 30 2020, 8:38 PM
lkail added a subscriber: lkail.Jul 30 2020, 8:55 PM
w2yehia edited the summary of this revision. (Show Details)Jul 31 2020, 8:31 AM

Hi Teresa(@tejohnson ) and Peter (@pcc )
This is the patch for the C++ dynamic_cast optimization that I emailed llvm-dev in April and got a reply from Teresa.
ThinLTO enablement for this optimization is not implemented yet (sorry have been trying read on thinLTO and fix this but haven't had any time yet)
Will it be possible to do a round of review for the current implementation and add thinLTO support later?
Thanks.

I'm ok with regular LTO implementation going in first, but it would be great to have a ThinLTO follow on implementation. Happy to provide pointers on that when you are ready.

I have a couple of comments below after a very quick skim. One is a bigger concern (the mangling prefetch matching in llvm). Hopefully others will provide some input on that as well.

clang/lib/CodeGen/ItaniumCXXABI.cpp
1474

Needs a description as to what this function is doing. Also a much more specific name. "ExtraHint" doesn't say much.

llvm/lib/Transforms/IPO/DynamicCastOpt.cpp
131

Not a language expert, but I don't think it is safe or a good idea for LLVM to be making assumptions about the mangling (which can vary e.g. Microsoft is quite different). You probably need to have clang encode additional info in either metadata or an llvm intrinsic. Perhaps you can leverage the type checks normally inserted for vtable loads for WPD under -fwhole-program-vtables.

steven_wu added inline comments.Sep 1 2020, 10:34 AM
llvm/lib/Transforms/IPO/DynamicCastOpt.cpp
15

I think this needs to be operator== for std::typeinfo for this to work for non-unique RTTI. The current pass seems to lower this into icmp which might cause miscompile.

ormris added a subscriber: ormris.Sep 2 2020, 10:22 AM

Teresa and Steven, thanks for the comments.

clang/lib/CodeGen/ItaniumCXXABI.cpp
1474

Does SetNonPrivateBaseAttr sound better?
By the way, the Itanium ABI uses the term "hint" for the 4th parameter, so I thought this would be a natural extension, but definitely can use an accurate name for what we're doing here.

llvm/lib/Transforms/IPO/DynamicCastOpt.cpp
15

true... I wasn't sure how to achieve that.
The operator== function is defined in the <typeinfo> header, and not in the std library for us to generate a call to.
Ideally clang would tell us whether the RTTI is unique, but I don't think it has access to that info since it's determined in the typeinfo header.
Any suggestions?

131

a very valid cocern;
I'm doing some refactoring to hide the ABIism behind an interface in this pass, and have that interface support Itanium ABI for now.
Will update this patch shortly.

w2yehia updated this revision to Diff 293335.Sep 21 2020, 9:52 PM
w2yehia added inline comments.Sep 21 2020, 10:12 PM
llvm/lib/Transforms/IPO/DynamicCastOpt.cpp
131

Hi Teresa (@tejohnson). I updated the patch as follows.

  1. create a utility class to represent C++ ABI related properties and queries (see CXXABI.h).
  2. create a GuessABI function that uses the triple to guess the C++ ABI.
  3. Run the pass only when the ABI is Itanium.

Originally I thought we could write a generic transformation that might work for the MS ABI as well as the Itanium, but after abit of diggin around I'm much more inclined to keep this pass Itanium-only, and extend it in the future to the MS ABI.

Thanks for reviewing.

@pcc since some of the questions below related to type metadata and type intrinsics he added for WPD, and he's probably a good person to resolve some of the questions regarding where to do the CXX ABI handling.

Sorry for the slow response. I'd like to resolve where and how best to recognize the CXX ABI related issues. I really think that belongs in clang and not llvm. But I suspect you don't actually need a lot of that. For example, we can go by which global vars have type metadata to identify vtables. See for example the code that builds the ThinLTO summary info for whole program devirtualization that makes the assumption about type metadata being attached to vtable defs in ModuleSummaryAnalysis.cpp (see call to computeVTableFuncs).

Perhaps all you need to do is to add some sort of llvm.type.test/llvm.assume intrinsics for __dynamic_cast calls in clang. E.g. similar to what is inserted for vtable loads under -fwhole-program-vtables for WPD. That will allow you to associate the dynamic cast with the right type metadata and vtables. E.g. something like:

%p = call i1 @llvm.type.test({ i8*, i8* }* @_ZTI1B, metadata !"_ZTS1B")
call void @llvm.assume(i1 %p)

From the type info name you can get to the vtable via the type metadata.

I'm not sure whether this is legal usage of the @llvm.type.test, but hopefully @pcc can comment or give other ideas/thoughts.

I haven't had a chance to go through the guts of the IPO optimization yet, but will try to take a look through that soon. I would imagine some of the implementation will change depending on the outcome of the above questions.

llvm/include/llvm/Support/CXXABI.h
25 ↗(On Diff #293335)

Why is it only looking for PowerPC? I.e. it isn't recognizing x86_64 and a number of other things that are Itanium. See also isItaniumFamily() in clang/include/clang/Basic/TargetCXXABI.h.

I do still think that this is putting things in llvm that belong in clang, see my other comment about that.

aganea added a subscriber: aganea.Apr 23 2021, 11:17 AM
MTC added a subscriber: MTC.Aug 3 2021, 5:28 AM