This is an archive of the discontinued LLVM Phabricator instance.

Handle non-inlined clang::Type::getAs specializations in extract_symbols.py
ClosedPublic

Authored by sberg on Nov 9 2016, 9:14 AM.

Details

Summary

The existing logic was to discard any symbols representing function template instantiations, as the definitions were assumed to be inline. But there are three explicit specializations of clang::Type::getAs that are only defined in Clang's lib/AST/Type.cpp, and at least the plugin used by the LibreOffice build (https://wiki.documentfoundation.org/Development/Clang_plugins) uses those functions.

Diff Detail

Repository
rL LLVM

Event Timeline

sberg updated this revision to Diff 77359.Nov 9 2016, 9:14 AM
sberg retitled this revision from to Handle non-inlined clang::Type::getAs specializations in extract_symbols.py.
sberg updated this object.
sberg added a reviewer: john.brawn.
sberg added a subscriber: llvm-commits.
john.brawn added inline comments.Nov 10 2016, 8:57 AM
utils/extract_symbols.py
136 ↗(On Diff #77359)

Instead of listing each symbol specifically it would be better to use a regex, e.g. re.match('??$getAs.+Type@clang@@') (note: I haven't tested this regex to check that it works).

210 ↗(On Diff #77359)

Same here, though here you can just check that the symbol starts with _ZNK5clang4Type5getAs.

sberg added inline comments.Nov 10 2016, 9:14 AM
utils/extract_symbols.py
210 ↗(On Diff #77359)

We still want to filter out the majority of clang::Type::getAs intstantiations (as they're implicit instantiations of the inline function template), so I thought listing the three symbols explicitly in the two should_keep_*_symbol functions is the best way. An alternative would be to use a regex that matches just those three cases---I can do that change if you prefer it that way.

john.brawn added inline comments.Nov 11 2016, 5:52 AM
utils/extract_symbols.py
210 ↗(On Diff #77359)

My concern here is that if things change in the future (e.g. more specializations of clang::Type::getAs are added in the future) then the code here would have to change, but a regex would hopefully be more robust. Also from a quick check (nm clang | grep 4Type5getAs | wc -l) there are 37 instances of this function, which isn't many.

sberg added inline comments.Nov 11 2016, 6:55 AM
utils/extract_symbols.py
210 ↗(On Diff #77359)

"My concern here is that if things change in the future (e.g. more specializations of clang::Type::getAs are added in the future) then the code here would have to change": as it would have to when other non-inline explicit specializations are added; those are apparently always special cases that need some sort of manual maintenance here (unless all function template instantiations should be "filtered in")

"but a regex would hopefully be more robust": more robust in what sense?

"there are 37 instances of this function, which isn't many": I'm not sure I get what you want to imply with that (that it would be OK to "filter in" all 37 of them, implicit instantiations as well as explicit specializations?)

john.brawn added inline comments.Nov 11 2016, 8:06 AM
utils/extract_symbols.py
210 ↗(On Diff #77359)

more robust in what sense?

If we check for symbols that start with _ZNK5clang4Type5getAs (or the equivalent regex on the microsoft mangling side of things) then that will match all instantiations of the clang::type::getAs function template (both of the generic version in the header file and any explicit specializations). Currently there are three explicit specializations but if more were added in the future then exporting all clang::type::getAs symbols would mean these new specializations would also be exported, whereas exporting only the three listed symbols would mean any new specializations would not be exported. (I have no idea if this is even remotely likely, so this may be an entirely theoretical concern.)

I'm not sure I get what you want to imply with that (that it would be OK to "filter in" all 37 of them, implicit instantiations as well as explicit specializations?)

The reason we're doing all this stuff with deciding whether or not to export certain symbols is that we have a limit of 65535 symbols that can be exported from a dll/exe on Windows (when using MSVC), and 37 isn't many compared to 65535. Exporting symbols that we don't need to (e.g. the instances of clang::type::getAs that get instantiated from the generic definition in the header file) is perfectly fine, just so long as we don't go over that limit of 65535. This does mean that should_keep_itanium_symbol is, to some degree, pointless, as we don't have that limit when building using gcc, but it's there for the sake of consistency which is valuable in that it reduces platform-specific sources of failure (e.g. if something isn't working due to a missing symbol on Windows then you can then debug that on Linux).

sberg added inline comments.Nov 11 2016, 8:21 AM
utils/extract_symbols.py
210 ↗(On Diff #77359)

Thanks, got you now. I had somehow started to operate under the wrong assumption that we'd need to be careful to filter out any weak (and whatever's the matching term for MSVC) inline functions.

sberg updated this revision to Diff 77807.Nov 14 2016, 6:45 AM
sberg marked 2 inline comments as done.
john.brawn accepted this revision.Nov 14 2016, 8:56 AM
john.brawn edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 14 2016, 8:56 AM
This revision was automatically updated to reflect the committed changes.