Page MenuHomePhabricator

[compiler-rt] Use common static library for sanitizer_common.
AbandonedPublic

Authored by mpividori on Jan 5 2017, 8:50 AM.

Details

Summary

Many sanitizers's rt libraries include code from sanitizer_common. For example: leak sanitizer, ub sanitizer, asan sanitizer, etc.
This can generate link errors when combining many sanitizers that are implemented as static libraries, because the code of sanitizer_common is included many times resulting in multiple definitions.
Now, this link error is avoided by including many sanitizers (leak, ub, etc) in asan's rt library.

In this diff I define a common static library including main code from sanitizer_common: "clang-rt.san_common.a", and remove that code from all sanitizer's static libraries, like: "clang-rt_asan.a", clang-rt_lsan.a", etc. Also, I modify clang driver to include "clang-rt.san_common.a" when necessary (in a different diff, because that changes go in clang repository).

I think this is a better approach since it will enable to combine many different sanitizers with no link errors, and without including all of them in the same library.

Diff Detail

Event Timeline

mpividori updated this revision to Diff 83245.Jan 5 2017, 8:50 AM
mpividori retitled this revision from to [compiler-rt] Use common static library for sanitizer_common..
mpividori updated this object.
mpividori added reviewers: kcc, rnk, zturner.
mpividori set the repository for this revision to rL LLVM.
mpividori added a subscriber: llvm-commits.
aizatsky added inline comments.Jan 5 2017, 12:27 PM
lib/stats/CMakeLists.txt
19

This one doesn't have RTSanitizerCommon & RTSanitizerCommonLibc. Is this intended?

mpividori added inline comments.Jan 5 2017, 12:36 PM
lib/stats/CMakeLists.txt
19

@aizatsky Yes, this is intended. I remove that from from sanitizer static libraries, and move that code to san_common static library.

aizatsky added inline comments.Jan 5 2017, 12:39 PM
lib/stats/CMakeLists.txt
19

But SHARED version needs them (above)?

mpividori added inline comments.Jan 5 2017, 12:41 PM
lib/stats/CMakeLists.txt
19

@aizatsky Yes, shared versions include all the necessary code. This changes are only for STATIC libraries.

aizatsky edited edge metadata.Jan 5 2017, 12:53 PM

Marcos,

I worry that our set of static and shared runtimes will be different. If we do this split, we should do it for dynamic libraries too. I don't know if we have static initializers, but we might end up running them twice.

I suggest you do the same separation for dynamic runtimes as will.

Hi Mike,

Marcos,

I worry that our set of static and shared runtimes will be different. If we do this split, we should do it for dynamic libraries too. I don't know if we have static initializers, but we might end up running them twice.

What do you mean by running them twice? Inside each shared library? For shared libraries, nothing changes, so it will be the same than before.
For static libraries, this is also the same. But I change how the code is included.
Instead of including the code from sanitizer common inside each static library, and then including that static library (which fails if we include more than once).
I remove the san_common code from the static libraries, and move that code into a separate static library.
The list of object files included at the end is the same.

I suggest you do the same separation for dynamic runtimes as will.

If you think it is clearer, I can remove RTSanitizerCommon RTSanitizerCommonLibc from shared libraries too, and add a dependency on clang_rt.san_common.a
(At the end is the same, the code of RTSanitizerCommon RTSanitizerCommonLibc will be included in that shared library).

Let me know if I should explain this better.
Thanks

This is all about code duplication, right? When you use static libraries - linking gives you an error. When you use dynamic libraries the code is still duplicated, right? Each .so file will still have its own copy of common functions and data. But loader will just pick the first version.

I say: let's straighten this up for .so files as well. At least we'll have a consistent picture for static/shared runtimes.

This is all about code duplication, right? When you use static libraries - linking gives you an error. When you use dynamic libraries the code is still duplicated, right? Each .so file will still have its own copy of common functions and data. But loader will just pick the first version.

I see we only use shared libraries for APPLE, I was wondering why can't we use static libraries there? (for asan, we also provide shared libs for linux).
When using shared library the code of san_common is duplicated. The global data and the static constructors are duplicated.
When resolving the undefined references to sanitizer_common's functions on the main executable, I understand the linker will pick the first shared library that defines that symbols and link with that.

I think we could have problems using shared libraries, and including san_common code twice, that maybe we didn't realize before.
I mean, I will explain this with an example, let suppose we compile main.cc with address and undefined behaviour sanitizers in APPLE.
Then, clang will pass to the linker: main.o , asan.so and ubsan.so.
So, san_common's code will be included in both: asan.so and` ubsan.so`.
The references to san_common in main.o (the instrumentation) will be linked to the implementation in asan.so, since it is the first to be considered.
But the implementation of ubsan.so will be considering its own implementation (not the one included in asan.so).
So, I think we could have problems because the code and data from san_common is duplicated and it is not the same considered by asan and ubsan.
I think we could fix this problem, ALWAYS including san_common as a static library in the main executable, and not including that code in the shared libraries nor the static library.
This way, we can be sure that the code of san_common is not duplicated in the final program.

@aizatsky So, I think the key question here is this:

"Sanitizer_common only provides stateless functionalities, or it also includes some global data that we should ensure is only included once?"

In the first case, when it only include stateless functionalities, there is no problem if we duplicate the code in shared libraries (for static libraries we can't, the linker will complain).
In the second case, it includes global data that should be unique, I think we should considered what I suggested before, removing that code from shared libraries, and only including it in the static library.

I see we only use shared libraries for APPLE, I was wondering why can't we use static libraries there?
let suppose we compile main.cc with address and undefined behaviour sanitizers in APPLE.
Then, clang will pass to the linker: main.o , asan.so and ubsan.so.

(On Darwin) When using both -fsanitize=address and -fsanitize=ubsan, the driver will only link in the ASan dylib. It already also contains the UBSan runtime. This solution currently works and I don't see many problems with that because the number of sanitizer combinations is very limited (you can't use e.g. ASan with TSan together).

We really want to keep using dynamic libraries (at least on Darwin), because 1) we're relying on features only available in dylibs (dyld interposition) and 2) you can easily link multiple instrumented modules (libraries) together without worrying which one should carry the static archive.

In D28359#637481, @kubabrecka wrote:

I see we only use shared libraries for APPLE, I was wondering why can't we use static libraries there?
let suppose we compile main.cc with address and undefined behaviour sanitizers in APPLE.
Then, clang will pass to the linker: main.o , asan.so and ubsan.so.

(On Darwin) When using both -fsanitize=address and -fsanitize=ubsan, the driver will only link in the ASan dylib. It already also contains the UBSan runtime. This solution currently works and I don't see many problems with that because the number of sanitizer combinations is very limited (you can't use e.g. ASan with TSan together).

We really want to keep using dynamic libraries (at least on Darwin), because 1) we're relying on features only available in dylibs (dyld interposition) and 2) you can easily link multiple instrumented modules (libraries) together without worrying which one should carry the static archive.

@kubabrecka

Thanks for your comments.

Ok, yes the code is included in asan shared library and clang only include the ubsan shared library when not required the asan shared library, I forgot to check clang driver.
Is there any reason to include ubsan and leak sanitizers in the asan shared library?
Don't you think it would be better to keep them separated so we can combine them only when necessary? this would also simplify clang driver code.

I see the stats shared library for apple also includes sanitizer_common 's code. So, when using "asan" and "stats" we are in the situation that I mentioned before, with duplicated code for sanitizer common.

I think the 2) reason you mention is not a problem, since clang will only include the static libraries when linking the executable, not when creating shared or static libraries.
So there is no problem in linking multiple instrumented modules, since none of them include sanitizer static libraries.

filcab added a subscriber: filcab.

This seems a bit the opposite of part of r159129 (I'm adding Chandler so it's on his radar).
The sanitizers which can be combined are combined by having a "main" one, which includes the RT library for the other one, aren't they?
As in:
ASan will include ASan-base, LSan, UBSan, SanitizerCommon
etc.

I'm not sure what this change will improve it. Like Kuba said, the clang driver is the one that knows about the sanitizers, and which runtimes are needed, and needs to only link ASan in the case where ASan and UBSan are requested. IIRC, this is the same on other platforms. It's that way on the PS4 too.

Thank you,

Filipe
kcc edited edge metadata.Jan 5 2017, 3:24 PM

Same comment as in the other part:
I am not a fan of any such changes as they complicate things further.
Also, will the GCC build system and GCC driver have to change as well?

Does it actually fix any problem that exists today?

@filcab
Yes, now it works that way. Is that what we want?
I mean, we could include all the sanitizers in only one unique run time library, or we can try to keep them separated in different independent libraries and only include them when necessary. Both approaches are valid.
Now, I think there is a bit of everything, which is confusing.
For some sanitizers combinations we include them in the same library, for others we keep them separated.
I was trying to understand, is there any reason why we keep ubsan and leak sanitizers inside asan library?

In D28359#637524, @kcc wrote:

Same comment as in the other part:
I am not a fan of any such changes as they complicate things further.
Also, will the GCC build system and GCC driver have to change as well?

Does it actually fix any problem that exists today?

@kcc

+ My main goal is to port libFuzzer to Windows.

+ To port libFuzzer to Windows, I need Santinizer Coverage to work on Windows.

+ To make Sanitizer Coverage work on Windows, I need to update the code, since currently it doesn't work because it relies on weak symbols.
This works fine for Linux, since weak symbols work in both cases, when using a static asan rt or when considering a shared asan rt.
In Windows, we have something similar to weak symbols, using "pragma"(/alternatename...". This works for static libraries but this doesn't work for dlls.
For MD, asan is implemented in a separate dll, and the linker will fails because we can not set "__sanitizer_cov_trace_*" functions as optional in "clang-rt_asan_dynamic-arch.dll".

So, to make it work on Windows, my proposal was just to decouple sanitizer coverage from asan and create an independent static library for sanitizer coverage. This static library will only include the functions: "__sanitizer_cov_trace_*" as weak symbols.
Sanitizer coverage will always be a static library, no matter if asan is static or shared.

+ To create an independent static library for sanitizer coverage, I need to move sanitizer_common into a separate static library, which is the purpose of this diff.
I need sanitizer_common's code for the sanitizer coverage static library. Also that code is included in asan static library (asan is static when using MT on Windows, and by default on linux), resulting in link error because of multiple definitions.
So, I decided to separate sanitizer common into a different static library, and ensure it is only included once.

So, this is how I ended up with these changes.

I can isolate these changes, and include them only for Windows, but I thought it would be better to take the same approach for all the platforms: "decouple sanitizer coverage from the rest of sanitizers and always include it as a static library".

Isn't ASan already using weak symbols on Windows? How does it do that?

Hi @kubabrecka ,
No, weak symbols are ignored on Windows (SANITIZER_WEAK_ATTRIBUTE is empty).
It only uses "pragma"(/alternatename..." for some optional functions with default implementation, like: "sanitizer_malloc_hook", "sanitizer_print_memory_profile". But this only works for functions that are statically linked to asan library, for example, when they are defined in other sanitizers.
We can't use that for sanitizer coverage, because functions could be defined on the user side, and should work when asan is a shared library (a dll in Windows).
Because of that I decided to decouple sanitizer coverage from asan, and make sanitizer coverage always a static library for Windows.

Ok. Anyway, multiple people are against this change. How about just using dlsym/GetProcAddress to find the callbacks at runtime? That should work on all platforms and there might be a way to hide this behind some macros.

Or just build the .a library you need for Windows only.

kubamracek requested changes to this revision.Jan 5 2017, 10:26 PM
kubamracek added a reviewer: kubamracek.
This revision now requires changes to proceed.Jan 5 2017, 10:26 PM
mpividori abandoned this revision.Jan 6 2017, 7:35 PM

Ok. Thanks for your comments.
I took a different direction, so I will abandon this diff since I don't need these changes any more.
Thanks.