Page MenuHomePhabricator

[sanitizers] Implement GetTls on Solaris
AbandonedPublic

Authored by ro on Nov 17 2020, 2:28 AM.

Details

Summary

In the initial Solaris ASan port, GetTls was left unimplemented. This patch corrects that. There are a couple of caveats, unfortunately:

While current Solaris 11.4 supports the dlpi_tls_modid field of struct dl_phdr_info, this was only added in SRU 10 and isn't present in either Solaris 11.3 or Illumos. Instead, this uses a method used in GCC's D runtime library libphobos dlpi_tls_modid workaround which works even on Solaris 10.

However, the direct call to __tls_get_address triggers a Solaris ld bug on amd64, which needs to be worked around the same way as in libphobos: ld workaround.

Together, they allow the sanitizer_common TLS tests to PASS on both sparc and x86.
I've also verified that the patch doesn't break the Illumos build; however compiler-rt test results continue to be horrible there.

Diff Detail

Event Timeline

ro created this revision.Nov 17 2020, 2:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2020, 2:28 AM
Herald added subscribers: Restricted Project, pengfei, fedor.sergeev and 2 others. · View Herald Transcript
ro requested review of this revision.Nov 17 2020, 2:28 AM
vitalybuka added inline comments.Nov 17 2020, 4:38 PM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
449–514

can this go into sanitizer_platform_limits_solaris.h ?

ro added inline comments.Nov 18 2020, 12:40 AM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
449–514

I don't think it belongs there: AFAICS that header is for types used by the interceptors.

I've been following what other targets do here, like declaring internal types and functions, and adding helpers like GetSizeFromHdr. It would only be confusing if Solaris were treated differently. It certainly helped me a lot being able to see what other targets do in once place.

MaskRay added inline comments.Nov 18 2020, 5:51 PM
clang/lib/Driver/ToolChains/Solaris.cpp
157

Can the configure-time variable LINKER_SUPPORTS_Z_RELAX_TRANSTLS be replaced by a version check on the triple suffix?

You can see some tests where x86_64-unknown-freebsd11 or x86_64-unknown-freebsd12 is used. The idea is that the driver continuously bumps the default version.

157

.. and the code base should not be littered with -D macros from configure-time variables here and there.

clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

GNU ld reports a warning instead of an error when an unknown -z is seen. The warning remains a warning even with --fatal-warnings.

ro added inline comments.Nov 19 2020, 2:26 AM
clang/lib/Driver/ToolChains/Solaris.cpp
157

Can the configure-time variable LINKER_SUPPORTS_Z_RELAX_TRANSTLS be replaced by a version check on the triple suffix?

You can see some tests where x86_64-unknown-freebsd11 or x86_64-unknown-freebsd12 is used. The idea is that the driver continuously bumps the default version.

This cannot work: both Solaris 11 and Illumos share the same configure triple, x86_64-pc-solaris2.11.

Please keep in mind that Solaris 11 was released almost exactly 9 years ago, is in its 5th micro version since, each of which getting monthly updates (at least for a while). Even if it were possible to extract the micro version and update number from the triple, hardcoding which introduced which feature is effectively impossible. Some ld and libc features are also backported to an update of the previous micro version; hardcoding all this knowledge is simply a nightmare.

TBH, I feel put back to the bad old times 20+ years ago when software tried to hardcode all knowledge about the environment, and usually failed even at the time of writing. This only got better once feature-based approaches got traction.

157

.. and the code base should not be littered with -D macros from configure-time variables here and there.

What's so bad about that? Why is LINKER_SUPPORTS_Z_RELAX_TRANSTLS detected at build time better than a runtime check for Solaris/x86 11.3 or higher, but not 11.3 update < x and not Illumos?

With the configure-based approach, I know at least which features are tested by the code, rather than finding months or even years later that some check is hardcoded in some obscure part of the code that I missed? And for new targets, many things just fall into place automatically. Consider compiler-rt's maze in sanitizer_platform_interceptors.h. With features introduced in micro versions and updates, one cannot even express that some feature got introduced in one of those.

clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

GNU ld reports a warning instead of an error when an unknown -z is seen. The warning remains a warning even with --fatal-warnings.

Thanks for reminding me about that misfeature of GNU ld. I guess check_linker_flags needs to be updated to handle that.
In the case at hand, it won't matter either way: the flag is only passed to ld, which on Solaris is guaranteed to be the native linker. Once (if at all) I get around to completing D85309, I can deal with that. For now, other targets won't see linker warnings about this flag, other than when the flag is used at build time.

Other than file issue, compiler-rt part is LGTM
leaving the rest to @MaskRay

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
449–514

Chech xdl_iterate_phdr and sanitizer_freebsd.h
I think it's better to move this into some sanitizer_solaris.h/cpp as well

MaskRay added inline comments.Nov 19 2020, 5:20 PM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

OK. Then I guess you can condition this when the OS is Solaris?

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
455

In C++ you can just write struct .... { ... }, not need for typedef

ro added inline comments.Nov 20 2020, 1:52 AM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

OK. Then I guess you can condition this when the OS is Solaris?

I fear not: LINKER_SUPPORTS_Z_RELAX_TRANSTLS is tested inside an if in Solaris.cpp: this code is also compiled on non-Solaris hosts. Why are you worried about the definition being always present?

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
449–514

Chech xdl_iterate_phdr and sanitizer_freebsd.h
I think it's better to move this into some sanitizer_solaris.h/cpp as well

Nice: this removes the clutter in common code.

What about GetSizeFromHdr, though? It still lives in sanitizer_linux_libcdep.cpp on FreeBSD. I feel like it's better to keep the two ports similar in this respect.

455

In C++ you can just write struct .... { ... }, not need for typedef

Thanks for the reminder: I keep forgetting about this, being primarily a C guy.

MaskRay added inline comments.Nov 20 2020, 12:33 PM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

It is not suitable if LINKER_SUPPORTS_Z_RELAX_TRANSTLS returns a wrong result for GNU ld, even if it is not used for non-Solaris. We should make the value correct in other configurations.

ro added inline comments.Nov 23 2020, 6:39 AM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

It is not suitable if LINKER_SUPPORTS_Z_RELAX_TRANSTLS returns a wrong result for GNU ld, even if it is not used for non-Solaris. We should make the value correct in other configurations.

Tell the binutils maintainers that ;-) While I'm still unconcerned about this particular case (it's only used on a Solaris host where clang hardcodes the use of /usr/bin/ld), I continue to be annoyed by GNU ld's nonsensical (or even outright dangerous) behaviour of accepting every -z option.

It took me some wrestling with cmake , but now check_linker_flag correctly rejects -z flags where GNU ld produces the warning.

Some caveats about the implementation:

  • check_cxx_compiler_flag doesn't support the FAIL_REGEX arg, so I had to switch to check_cxx_source_compiles instead.
  • While it would be more appropriate to add the linker flag under test to CMAKE_REQUIRED_LINK_OPTIONS, that is only present since cmake 3.14 while LLVM still only requires 3.13.
ro updated this revision to Diff 307067.Nov 23 2020, 6:42 AM
  • Move declarations to new sanitizer_solaris.h.
  • Augment check_linker_flag do reject unknown -z options that GNU ld noisily accepts.

Tested on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu. CMakeCache.txt was unchanged on Solaris, while LINKER_SUPPORTS_Z_RELAX_TRANSTLS is now false on Linux, both for stage 1 and 2.

Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2020, 6:42 AM
MaskRay added inline comments.Nov 23 2020, 7:30 PM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

warning: -z.* ignored

Doesn't this stop working if binutils starts to use error: -z.* ignored? Isn't there a way to call check_linker_flag only when the target is Solaris? Does LLVM_LINKER_IS_SOLARISLD work?

ro added inline comments.Nov 24 2020, 6:00 AM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

warning: -z.* ignored

Doesn't this stop working if binutils starts to use error: -z.* ignored?

No: FAIL_REGEX only adds to error detection, every other condition just remains as is.

Isn't there a way to call check_linker_flag only when the target is Solaris? Does LLVM_LINKER_IS_SOLARISLD work?

That would be wrong: this is about working around a GNU ld bug. Imagine adding a new -z option to lld which either GNU ld didn't adopt at all or only in the latest release. You'd want to reject the use of that option on earlier GNU ld just the same, no Solaris in sight.

As I said: LINKER_SUPPORTS_Z_RELAX_TRANSTLS must always be defined because Solaris.cpp is always compiled no matter what the target.

I really don't understand what you're trying to guard against by putting up roadblock after roadblock for this patch.

MaskRay added inline comments.Nov 24 2020, 3:35 PM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

I really don't understand what you're trying to guard against by putting up roadblock after roadblock for this patch.

Because I am concerned the additional Solaris specific complexity (to make systems released 9 years ago work) in generic code (CheckLinkerFlag.cmake) may not pull its weight. I am sorry but I hope it is not unfair to say that Solaris is not a first-tier OS. I am fairly worried about more configure-time variables which can fragment testing (testing one specific configuration does not guarantee it working in another; this patch makes the situation worse).

Can't you make the Z_RELAX_TRANSTLS check only running on Solaris?

MaskRay added inline comments.Nov 24 2020, 5:30 PM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

ld.bfd --fatal-warnings -z relax=transtls a.o is an error.

ro added inline comments.Nov 25 2020, 2:48 AM
clang/tools/driver/CMakeLists.txt
123 ↗(On Diff #305706)

I really don't understand what you're trying to guard against by putting up roadblock after roadblock for this patch.

Because I am concerned the additional Solaris specific complexity (to make systems released 9 years ago work) in generic code (CheckLinkerFlag.cmake) may not pull its weight. I am sorry but I hope it is not unfair to say that Solaris is not a first-tier OS. I am fairly worried about more configure-time variables which can fragment testing (testing one specific configuration does not guarantee it working in another; this patch makes the situation worse).

Then why on earth didn't you say so up front instead of asking the same questions repeatedly, obviously only half reading the answers? Besides, the check is purely for the benefit of Illumos, an open-source OpenSolaris derivative. Solaris can use the option unconditionally.

Can't you make the Z_RELAX_TRANSTLS check only running on Solaris?

I've already answered this twice, obviously you're not listening.

This is the most hostile and toxic review I've ever experienced: instead of rejecting the patch up-front either for policy reasons ("we don't want Solaris support in LLVM") or for technical ones, you're just casting doubt with little technical content, all with a vague undercurrent of dislike.

This is just a total waste of my time: I'm leaving LLVM development for good. Mission accomplished.

vitalybuka resigned from this revision.Feb 17 2021, 1:45 PM
MaskRay resigned from this revision.Feb 17 2021, 4:58 PM
ro updated this revision to Diff 408389.Feb 14 2022, 6:26 AM
ro added a subscriber: nikic.

Since the build-time check for the ld -z relax=transtls option was met with massive resistance and was only necessary for Illumos anyway, this revision just uses it unconditionally.

Tested on amd64-pc-solaris2.11 and sparcv9-sun-solaris2.11.

The Illumos community will have to provide a way to somehow distinguish the two OSes. The configure triple is still the same, and the uname -o output (Solaris vs. illumos seems to be hardcoded in the uname command and not be available programmatically.

@vitalybuka , @MaskRay could I persuade you to review the revision version which should be much less controversial and completely Solaris-specific? Thanks.

MaskRay added a comment.EditedFeb 14 2022, 6:15 PM

GetTls is about the static TLS block size. It consists of the main executable's TLS, and initially loaded shared objects' TLS, struct pthread, and padding.
Solaris should be able to just use the code path like FreeBSD, Linux musl, and various unpopular architectures of Linux glibc:

#elif SANITIZER_FREEBSD || SANITIZER_LINUX
  uptr align;
  GetStaticTlsBoundary(addr, size, &align);

I think NetBSD should use this code path as well, but I don't have NetBSD for testing.

The Clang driver change should be in a separate patch with clang/test/Driver tests.

ro added a comment.Feb 15 2022, 2:38 AM

GetTls is about the static TLS block size. It consists of the main executable's TLS, and initially loaded shared objects' TLS, struct pthread, and padding.
Solaris should be able to just use the code path like FreeBSD, Linux musl, and various unpopular architectures of Linux glibc:

#elif SANITIZER_FREEBSD || SANITIZER_LINUX
  uptr align;
  GetStaticTlsBoundary(addr, size, &align);

Not unconditionally, unfortunally: as the comment above GetSizeFromHdr explains, dlpi_tls_modid was only introduced in an update to Solaris 11.4 FCS (which is sort of a problem), but isn't present in 11.3 (don't reallly care) and Illumos (this would break compilation for them). OTOH my solution is successfully being used in GCC's libphobos on Solaris 11.3, 11.4, and most likely Illumos, too. I'd rather not burn the Illumos bridge if it can be avoided.

The Clang driver change should be in a separate patch with clang/test/Driver tests.

Done now: D119829.

ro added a comment.Feb 17 2022, 5:49 AM
In D91605#3322312, @ro wrote:

GetTls is about the static TLS block size. It consists of the main executable's TLS, and initially loaded shared objects' TLS, struct pthread, and padding.
Solaris should be able to just use the code path like FreeBSD, Linux musl, and various unpopular architectures of Linux glibc:

#elif SANITIZER_FREEBSD || SANITIZER_LINUX
  uptr align;
  GetStaticTlsBoundary(addr, size, &align);

Not unconditionally, unfortunally: as the comment above GetSizeFromHdr explains, dlpi_tls_modid was only introduced in an update to Solaris 11.4 FCS (which is sort of a problem), but isn't present in 11.3 (don't reallly care) and Illumos (this would break compilation for them). OTOH my solution is successfully being used in GCC's libphobos on Solaris 11.3, 11.4, and most likely Illumos, too. I'd rather not burn the Illumos bridge if it can be avoided.

As an experiment, I've tried to use GetStaticTlsBoundary instead. It does indeed work on recent Solaris 11.4 and the resulting patch is at D120048. I'm pretty certain that support for Solaris 11.3/Illumos which lack dlpi_tls_modid using dlinfo(RTLD_SELF, RTLD_DI_LINKMAP) can be added on top of that one, unbreaking the Illumos build. This would avoid considerable duplication of the dl_iterate_phdr code, which is certainly a nice benefit. I'll experiment with that route later.

I know now why I did the present patch the way it is: GetStaticTlsBoundary was only introduced months after I submitted this one.

ro added a comment.Feb 17 2022, 7:25 AM
In D91605#3329375, @ro wrote:

As an experiment, I've tried to use GetStaticTlsBoundary instead. It does indeed work on recent Solaris 11.4 and the resulting patch is at D120048. I'm pretty certain that support for Solaris 11.3/Illumos which lack dlpi_tls_modid using dlinfo(RTLD_SELF, RTLD_DI_LINKMAP) can be added on top of that one, unbreaking the Illumos build. This would avoid considerable duplication of the dl_iterate_phdr code, which is certainly a nice benefit. I'll experiment with that route later.

Now posted as D120059. It this approach is acceptable, the series of D119829, D120048, and D120059 will supercede this one.

ro abandoned this revision.Feb 22 2022, 11:21 AM

Superceded by D119829 and D120048. If absolutely necessary, D120059 could be revived if Illumos cannot implement dlpi_tls_modid.