This is an archive of the discontinued LLVM Phabricator instance.

Force visibility of llvm::Any to external
ClosedPublic

Authored by serge-sans-paille on May 5 2021, 10:15 PM.

Details

Summary

llvm::Any::TypeId::Id relies on the uniqueness of the address of a
static variable defined in a template function. hidden visibility implies vague
linkage for that variable, which does not guarantee the uniqueness of the address
across a binary and a shared library. This totally breaks the implementation of
llvm::Any.

See https://gcc.gnu.org/wiki/Visibility and https://gcc.gnu.org/onlinedocs/gcc/Vague-Linkage.html
for more information on that topic.

Diff Detail

Event Timeline

serge-sans-paille requested review of this revision.May 5 2021, 10:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 10:15 PM
tstellar added a subscriber: rnk.May 6 2021, 7:02 AM

Adding @rnk, who reviewed the original patch for awareness...

It's unfortunate, but I'm fine with this change. We could workaround the specific issue with llvm::Any another way, but there is nothing in place that would prevent someone else from adding a new feature that would hit the same issue again. It seems like disabling this completely is the correct approach, and we will need to come up with a different strategy to limit the number of exported symbols.

rnk added a subscriber: zturner.May 6 2021, 8:56 AM

As an example, llvm::Any::TypeId::Id relies on the uniqueness of the address of a
static variable defined in a template function.

I just read that line, and I'm thinking to myself, obviously that won't work, don't do that, it doesn't work with DLLs, it doesn't work with hidden visibility. I *just* helped prepare a talk explaining that this code pattern isn't portable. Ideally, I would like to move in the direction of compiling *more* of LLVM with hidden visibility, not less. LLVM shared libraries already have far too many dynamic symbols. I believe Lang Hames attempted to use this code pattern in the past to refactor the pass ID into a template base class, and we reverted it for the same reasons. I wasn't able to quickly find the patch for reference.

What code actually uses llvm::Any? I only see 8 includes of Any.h across the codebase. It looks like @zturner added it in 2018 for use in unit tests. Could we remove it? If we do need it, surely the standard C++17 implementation of std::any supports compiling with fvisibility=hidden, so could we just use whatever solution the STL uses? Maybe just put default visibility on the typeid global?

@rnk I agree with

As an example, llvm::Any::TypeId::Id relies on the uniqueness of the address of a
static variable defined in a template function.

I just read that line, and I'm thinking to myself, obviously that won't work, don't do that, it doesn't work with DLLs, it doesn't work with hidden visibility. I *just* helped prepare a talk explaining that this code pattern isn't portable. Ideally, I would like to move in the direction of compiling *more* of LLVM with hidden visibility, not less. LLVM shared libraries already have far too many dynamic symbols. I believe Lang Hames attempted to use this code pattern in the past to refactor the pass ID into a template base class, and we reverted it for the same reasons. I wasn't able to quickly find the patch for reference.

What code actually uses llvm::Any? I only see 8 includes of Any.h across the codebase. It looks like @zturner added it in 2018 for use in unit tests. Could we remove it? If we do need it, surely the standard C++17 implementation of std::any supports compiling with fvisibility=hidden, so could we just use whatever solution the STL uses? Maybe just put default visibility on the typeid global?

I also would really like to see more of LLVM with hidden visibility too. My main concern with this issue is that without some kind of enforcement in place, this same code pattern is going to get reintroduced somewhere else in the codebase. and debugging this issue was very costly. Do you have any ideas for how we might be able to ensure this pattern doesn't re-emerge?

@rnk it's used in the newPassManager. forcing the visibility of llvm::Any indeed fixes the bug I met (that was my original patch), but I also understand the concern of @tstellar concerning the possibility of having similar issue across the code base. A possible way to solve the issue the hard way would be to design a clang warning / a static analysis that spots this kind of issues. Thinking of it, maybe the check could be done at link time to, or on the final shared library?

rnk added a comment.May 6 2021, 12:40 PM

I also would really like to see more of LLVM with hidden visibility too. My main concern with this issue is that without some kind of enforcement in place, this same code pattern is going to get reintroduced somewhere else in the codebase. and debugging this issue was very costly. Do you have any ideas for how we might be able to ensure this pattern doesn't re-emerge?

I don't have any brilliant ideas other than developer education. This is just one of those weird portable development things that you need to know as a C++ developer. It's a difference between C++ as it is standardized and C++ as it works in practice.

@rnk Even if we were to commit this patch, do you think we would still need to fix/remove llvm:Any?

rnk added a comment.May 6 2021, 2:58 PM

@rnk Even if we were to commit this patch, do you think we would still need to fix/remove llvm:Any?

Yes, even if we commit this patch, it seems likely that someone would still run into these kinds of issues later. Passing an Any instance from one DLL to another and then attempting to cast wouldn't work, the ids will be different, and I don't think there is an easy way to fix it.

rnk added a comment.May 7 2021, 12:37 PM

Right, _LIBCPP_TEMPLATE_VIS presumably expands to __attribute__((visibility("default"))). I think it's reasonable to go ahead and do the same thing in LLVM. Enabling RTTI and using typeid is probably the only way to make this work well in the general case.

FYI, without this patch we're seeing consistent test failures when building LLVM 12 in CentOS Stream 8:

Failed Tests (10):
  LLVM-Unit :: IR/./IRTests/CGSCCCallbacksTest.InstrumentedInvalidatingPasses
  LLVM-Unit :: IR/./IRTests/CGSCCCallbacksTest.InstrumentedPasses
  LLVM-Unit :: IR/./IRTests/CGSCCCallbacksTest.InstrumentedSkippedPasses
  LLVM-Unit :: IR/./IRTests/FunctionCallbacksTest.InstrumentedPasses
  LLVM-Unit :: IR/./IRTests/FunctionCallbacksTest.InstrumentedSkippedPasses
  LLVM-Unit :: IR/./IRTests/LoopCallbacksTest.InstrumentedInvalidatingLoopNestPasses
  LLVM-Unit :: IR/./IRTests/LoopCallbacksTest.InstrumentedInvalidatingPasses
  LLVM-Unit :: IR/./IRTests/LoopCallbacksTest.InstrumentedPasses
  LLVM-Unit :: IR/./IRTests/LoopCallbacksTest.InstrumentedSkippedPasses
  LLVM-Unit :: IR/./IRTests/ModuleCallbacksTest.InstrumentedSkippedPasses

See https://cbs.centos.org/koji/taskinfo?taskID=2483905 for logs and https://bugzilla.redhat.com/show_bug.cgi?id=1952248 for more details. Thanks @tstellar for pointing out this fix!

rnk added a comment.May 11 2021, 10:44 AM

So, should we go ahead and apply default visibility to llvm::Any::TypeId<>::Id?

Coincidentally, people have discovered the same problem in absl, Google's general C++ utility library: https://crbug.com/1096380

serge-sans-paille retitled this revision from Do not set CMAKE_CXX_VISIBILITY_PRESET to hidden to For visibility of llvm::Any to external.
serge-sans-paille edited the summary of this revision. (Show Details)

@rnk setting the whole class as external is the less intrusive patch and I've tested it successfully. Maybe it's enough to set the default visibility to id() and template <typename T> const char Any::TypeId<T>::Id = 0 ?

rnk added a comment.May 11 2021, 12:25 PM

My first thought was to put it on the TypeId class, like so:
template <typename T> struct LLVM_EXTERNAL_VISIBILITY TypeId {
I don't think the id() method visibility matters, and fewer dynamic symbols is usually better.

FWIW, I was originally against adding LLVM_EXTERNAL_VISIBILITY to the class, because I thought it was just working around a fundamental flaw in -fvisiblity=hidden, but based on the discussion here, I've fine with the proposed solution.

llvm/include/llvm/ADT/Any.h
26

Can you add a comment explaining why we need this here?

serge-sans-paille updated this revision to Diff 344788.EditedMay 12 2021, 5:47 AM

Force external visibilty on a limited set of symbols

@dcavalca can you confirm this fixes your issue?

rnk added inline comments.May 12 2021, 10:39 AM
llvm/include/llvm/ADT/Any.h
113

Sorry for the back and forth, but if it is necessary to mark any_isa with default visibility, then I would prefer your original patch of putting it on the class.

However, I think it should only be necessary to give TypeId<T>::Id default visibility. It shouldn't matter which copy of any_isa or id() gets called, they will all take the address of the same unique Id global.

@dcavalca can you confirm this fixes your issue?

cc @tstellar I went ahead and ran a validation : these three symbols seems to be enough.
@rnk are you fine with the change now?

@rnk, let me double check that.

serge-sans-paille added a comment.EditedMay 12 2021, 12:11 PM

cc @tstellar I went ahead and ran a validation : these three symbols seems to be enough.

I actually messed up with my validation: these three symbols are not enough. I'm verifying that setting the visibility on the whole class works.

serge-sans-paille retitled this revision from For visibility of llvm::Any to external to Force visibility of llvm::Any to external.

I've successfully tested that one and it works fine.

(I'll add a note that this is one of the things blocking us from upgrading -Bsymbolic-functions => -Bsymbolic (there is really little benefit for upgrade since data symbols don't matter that much) )

@dcavalca can you confirm this fixes your issue?

Yes, the latest iteration of this diff works fine for me and all tests are passing: https://cbs.centos.org/koji/taskinfo?taskID=2487471

@rnk: does that look like an acceptable solution to you now?

rnk added a comment.May 18 2021, 11:14 AM

Yes, we should land this and fix the issue. Sorry for the delay, I think I started to write something and got distracted.

I'm a bit concerned that we don't seem to understand the fundamental issue. From reading the code, it seems clear that the Id static data member should have default visibility. It's not clear why anything else needs default visibility, and that is unsettling.

llvm/include/llvm/ADT/Any.h
26

Please do add a comment here. I'd suggest:

The `Typeid<T>::Id` static data member below is a globally unique identifier for the type `T`. It is explicitly marked with default visibility so that when `-fvisibility=hidden` is used, the loader still merges duplicate definitions across DSO boundaries.

Patch updated with the advised comment.

I second @rnk opinion on this patch being non-optimal, but I also think we need to land it as is for 12.0.1 release.

This revision was not accepted when it landed; it landed in state Needs Review.May 20 2021, 1:06 AM
This revision was automatically updated to reflect the committed changes.

I just tried building LLVM 12.0.1 with clang (12.0.1) and LTO enabled, and I'm seeing these failures again. I thought I had tested this combination before and it worked, but maybe I didn't actually do this.

serge-sans-paille added a comment.EditedAug 27 2021, 2:42 AM

After weeks of painful quest, I now understand why this is not enough:

compiling the following:

class __attribute__((visibility("default"))) foo{};
class __attribute__((visibility("hidden"))) bar{};

template<class T>
__attribute__((visibility("default"))) bool check() __attribute__((noinline)) {
  return true;
}

void pain(bool& x, bool& y) {
  x = check<foo>();
  y = check<bar>();
}

with

clang++ -shared -fPIC -o liba.so a.cpp -nostdlib

and inspecting the symbol table with

nm liba.so -DC

yields

0000000000201020 D __bss_start
0000000000201020 D _edata
0000000000201020 D _end
0000000000000330 T pain(bool&, bool&)
0000000000000360 W bool check<foo>()

interestingly, check<bar> is not listed, because a default linkage function parametrized by an hidden linkage class turns into an hidden linkage function (!). So a potential fix here would be to flag every llvm::Any::TypeID parameter class as default linkage (but we can't static assert on that, because we can't inspect such attributes). So the way forward really should be in the spirit of https://reviews.llvm.org/D108690

rnk added a comment.Aug 27 2021, 10:15 AM

Thanks for the explanation, that makes sense to me.

I think, for portability reasons, we really don't want to go in the direction of D108690. We want to move in the direction of default hidden visibility, and explicitly marking publicly visible things.

Can the issue you've encountered be addressed by giving the classes which are passed across DSO boundaries default visibility? The issue shouldn't arise if a hidden class is not used across DSO boundaries, but if there are two template instantiations of the same class in two DSOs and that causes an issue, that suggests that such classes are being used in both DSOs.

Going further, I think it would be incorrect for the linker/loader to behave differently: I should be able to provide two different definitions of the hidden class bar in separate DSOs and neither they nor their template instantiations should conflict.