This is an archive of the discontinued LLVM Phabricator instance.

[ubsan][test] Add weak attributes to symbols for dyld weak-def coalescing of Mach-O files
ClosedPublic

Authored by rsundahl on Jun 15 2022, 7:15 PM.

Details

Summary

Apple's dynamic linker won't weak-def-coalesce from a Mach-O file unless it contains
at least one weak symbol. These tests contain symbols intended to override weak
symbols from the sanitizer dylib but are never considered by dyld64 because the test programs
contain no weak symbols. Marking (at least one of) the intended override symbols "weak"
fulfills the dyld64 requirement for consideration in weak-def-coalescing and so it sees the
intended override symbol in the program first and chooses it over the matching symbol in the
sanitizer dylib.

rdar://95244261

Diff Detail

Event Timeline

rsundahl created this revision.Jun 15 2022, 7:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 7:15 PM
Herald added a subscriber: Enna1. · View Herald Transcript
rsundahl requested review of this revision.Jun 15 2022, 7:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 7:15 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

I included all of you because while this is a small change for an Apple issue, it stems from the fundamental reliance on the behavior of the linker and how it resolves multiple symbols in the context of dynamic linking. This behavior is not spelled out by any language specification and I wanted to stimulate some thinking on whether or not we should shed the dependence on weak definitions completely and move to more robust and explicit say callback registration. In this simple file, you can see that Apple, wine and openbsd all have their issues with weak symbols and are choosing to opt out.

MaskRay added a subscriber: MaskRay.EditedJun 15 2022, 7:59 PM

Apple's dynamic linker won't weak-def_coalesce from a file

This is unclear. What does weak-def_coalesce mean? Can you use a more formal language to describe the issue in the summary and in the comment?

See https://maskray.me/blog/2021-04-25-weak-symbol#ld.so for ELF behavior.

The behavior of weak defined symbols in the context of dynamic linking is undefined

This scares me. In ELF, according to the generic ABI, STB_WEAK/STB_GLOBAL does not affect symbol search.
From this sense, we can say that "weak defined symbol" does not do anything different.
This is very different from "it is undefined".

What are Obviously there are other platforms (above) that are having issues with the "strong-wins-over-weak" expectation but while it works that way when static linked,?

FWIW TestCases/Misc/monitor.cpp passes on one Apple M1 I can access.

Apple's platforms are not ELF and don't have the same rules as ELF systems. For instance Apple's platforms use two-level namespace where which imported symbol includes which shared library it is from. At runtime, dyld then only looks in that one shared library.

Apple's static linker ld64 when given a weak-def (N_WEAK_DEF) and non-weak-def for the same symbol will silently choose the non-weak-def symbol (like ELF linkers). But at runtime dyld uses a different mechanism. Unlike static linking where the linker can remove an overridden/unused function. At runtime all code is fixed. All the runtime linker can do is try to redirect uses. That means the static linker had to leave an indirection for dyld to patch and dyld has to be told to try to patch it. A process on macOS/iOS may have 1000 dylibs in it. It would be prohibitively expensive for dyld to search all 1000 dylibs for a non-weak-def every time it sees a weak-def.

In general, dyld only supports weak-def symbols overriding other weak-def symbols (there is a special case for operator new and friends). This is consistent with ODR. Every translation unit should have seen the same definition, so they would all be weak-def.

The bottom line is that the current trick of trying to get the dynamic linker to pick a non-weak-def symbol to override weak-def symbol, may work on ELF systems, but will not work reliable on Apple platforms. My suggestion is to not use weak-def symbol tricks at all, but add a function that can be called to install an alternate function.

I have some understanding of two-level namespace from an article "Mach-O linking and loading tricks" and I know its similarity with Solaris' direct binding (https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic#alternative-symbol-search-models).

This concept doesn't explain the int __attribute((weak)) foo; trick used in this test, though. The weak __ubsan_on_report was introduced in D48446 with the TestCases/Misc/monitor.cpp test.
And the test apparently works well on macOS since then. What has changed that leads to a possible need to revisit the mechanism?

Note that non-weak overriding is supported with weak bind table. See D86575 for the lld-macho implementation.

This concept doesn't explain the int __attribute((weak)) foo; trick used in this test, though.

It is a hack. One of the issues is that the static linker puts a bit in the header of each dylib (MH_WEAK_DEFINES) if the dylib has an external weak definitions. Dyld uses that bit (if unset) to skip over scanning that dylib when looking for weak-defs. The ubsan otherwise has no external weak-def symbols, so the bit was not set, so at runtime dyld never even looked at it, so the non-weak symbol in it was not overriding. That is why this gets the code working. But again this is fragile. Dyld is not meant to support non-weak-defs overriding weak-defs.

(@MaskRay, the test as is will fail when -mmacosx-version-min=12.0 and up)

MaskRay added a subscriber: int3.Jun 16 2022, 12:25 PM

This concept doesn't explain the int __attribute((weak)) foo; trick used in this test, though.

It is a hack. One of the issues is that the static linker puts a bit in the header of each dylib (MH_WEAK_DEFINES) if the dylib has an external weak definitions. Dyld uses that bit (if unset) to skip over scanning that dylib when looking for weak-defs. The ubsan otherwise has no external weak-def symbols, so the bit was not set, so at runtime dyld never even looked at it, so the non-weak symbol in it was not overriding. That is why this gets the code working. But again this is fragile. Dyld is not meant to support non-weak-defs overriding weak-defs.

@int3 ^

Add some information how the weak overriding is robust on ELF:

If ubsan defines a weak symbol, the overriding works well, with both libclang_rt.ubsan.so and libclang_rt.ubsan.a:

  • for executable ... libclang_rt.ubsan.so: rtld prefers the definition in the executable [1]
  • for executable (where libclang_rt.ubsan.a is linked into the executable): the static linker overrides the ubsan __ubsan_on_report with the one defined in monitor.cpp

[1] The symbol search list looks like executable, preload0, preload1, needed0, needed1, needed2, needed0_of_preload0, ..., needed0_of_needed0, needed1_of_needed0, ... where the executable is always the first.

...and dyld works differently as described above. Consequently, the behavior is different between different dynamic loaders and what I originally referred to as "undefined". (A better word would have been un-specified.) In the absence of a specification for how weak symbols should be treated by dynamic loaders, I think the irony of UBSan itself depending on undefined behavior for it's own integrity checks, shouldn't be lost on anybody.

...and dyld works differently as described above. Consequently, the behavior is different between different dynamic loaders and what I originally referred to as "undefined". (A better word would have been un-specified.) In the absence of a specification for how weak symbols should be treated by dynamic loaders, I think the irony of UBSan itself depending on undefined behavior for it's own integrity checks, shouldn't be lost on anybody.

I have re-read @kledzik's comments but am still puzzled.
The executable has an __ubsan_on_report override, so it gets the MH_WEAK_DEFINES flag in both ld64 and lld-macho (see lld/MachO/SyntheticSections.cpp:120).
Is MH_WEAK_DEFINES sufficient for dyld to search symbols in the executable?

Why will it fail with -mmacosx-version-min=12.0? Does ld64 start to drop MH_WEAK_DEFINES for non-weak-overriding-weak or does dyld start to do something different?

Why will it fail with -mmacosx-version-min=12.0? Does ld64 start to drop MH_WEAK_DEFINES for non-weak-overriding-weak or does dyld start to do something different?

ld64 starts doing this:

Dyld uses that bit (if unset) to skip over scanning that dylib when looking for weak-defs

and so doesn't open the file and doesn't find the non-weak function so doesn't consider it. This commit adds an arbitrary weak symbol so the file will be included in the "weak-def-coalescing" process.

MaskRay added a comment.EditedJun 16 2022, 2:56 PM

Why will it fail with -mmacosx-version-min=12.0? Does ld64 start to drop MH_WEAK_DEFINES for non-weak-overriding-weak or does dyld start to do something different?

ld64 starts doing this:

OK. With my understanding of object file formats, I think this is a large functional regression.

Any chance ld64 can be fixed to check weak binding table as well? @int3

// lld/MachO/SyntheticSections.cpp
  if (in.exports->hasWeakSymbol || in.weakBinding->hasEntry())
    hdr->flags |= MachO::MH_BINDS_TO_WEAK;

PS: it may or may be relevant if this change is for security reasons. A separate policy checker tool may be more appropriate than a tool designed for creating such object files.
In Michael Matz's words: "Not in the least because policies can sometimes change quite quickly (and arbitrarily) and hence need quickly adjustable tooling anyway and (even more so) that policies are different for different audiences and so encoding one specific policy into a tool looks wrong."

Dyld uses that bit (if unset) to skip over scanning that dylib when looking for weak-defs

and so doesn't open the file and doesn't find the non-weak function so doesn't consider it. This commit adds an arbitrary weak symbol so the file will be included in the "weak-def-coalescing" process.

Thanks for confirmation. I think I understand the issue now.

Thanks for confirmation. I think I understand the issue now.

Thank you for your consideration @MaskRay. I appreciate it. It's my understanding that the change is for efficiency and not security but could be both Even though this is an important discussion, at the end of the day, my concern is that even if all dynamic loaders behaved exactly the same, that behavior would not be defined by any C/C++ specifications that I know of and so would be undefined in those contexts. Therefore the original question still stands, "Should the sanitizers be dependent on the undefined behavior of weak symbols in a dynamic load environment? There are substantive differences in handling of dynamic weak symbols between openbsd, Windows, Linux and Apple.

__ubsan_on_report is not defined as weak. Redefining it here isn't supported
on Windows.

UNSUPPORTED: windows-msvc
Linkage issue
XFAIL: openbsd

which is basically empirical evidence of the situation. I am still suggesting that we replace the "strong-replaces-weak" model in use by the sanitizers with an explicit registration system that is robust and well defined. I would love to get input from parties that have more insight into motivation for weak symbols so that we might move gradually away from them.

"unspecified behavior" may be more appropriate. Weak definitions are well defined on ELF. On C/C++ land, "undefined behavior" refers to something more horrific.

From a recent ISO C draft

undefined behavior

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements
Possible undefined behavior ranges from *ignoring the situation completely with unpredictable results*, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

unspecified behavior

behavior, that results from the use of an unspecified value, or other behavior upon which this document provides two or more possibilities and imposes no further requirements on which is chosen in any instance

OK, if this ld64 change is not due to security consideration, then it streghthens my feeling that this should be fixed on the ld64 end.
From the linker side, it should be of a very low cost. The header flags (among of which MH_WEAK_DEFINES is concerned) are computed just once for a link.

From dyld side, taking a shortcut for a program deliberately using non-weak-override-weak behaviors seems unnecessary.
Such programs are likely uncommon. Improving symbol binding performance for them does not give a good return.
On the other hand, the ld64 change breaks a well-functioning feature which has no satisfactory replacement.
A registry+callback mechanism can be used, but it will add more overhead.
For applications who do need non-weak-override-weak behaviors, what will they do now? They have to add an arbitrary weak definition. This is counter-intuitive and error-prone.

Note: OpenBSD port has been removed from compiler-rt. Perhaps they want to implement it downstream or they just don't need it. So OpenBSD occurrences in tests can be ignored.

MaskRay requested changes to this revision.Jun 30 2022, 11:37 AM
This revision now requires changes to proceed.Jun 30 2022, 11:37 AM

ld64 starts doing this:

but should have read:

dyld starts doing this:

(Sorry for any confusion that this may have caused @MaskRay.) It is my understanding that there is no change in the static linker, only to the dynamic linker. I'm quite certain that we will not be reverting the change to the dyld as it is both required and has been shipping for the past year. Short of using XFAIL I have little choice than to set the MH_WEAK_DEFINES bit by including a weak symbol in the dylib. The weak symbol could well be the so-called "strong" symbol itself, but the optics of making a symbol weak in order to replace a different weak symbol may be somewhat counterintuitive when holding onto the "strong-replaces-weak" notions. That said, and as we discussed above, in the context of dynamic loading, it's not "strong-replaces-weak", it's "ignore-weak-attribute-and-use-first-encounter. (This is true with Darwin as it is with Linux and other dynamic loaders). With the clarification that this requirement is only for dyld on Darwin, would the conditional injection of a weak symbol on Apple platforms be acceptable? (Or alternatively, marking the intended override symbol as "weak"?) Thank you.

MaskRay added a comment.EditedJul 1 2022, 1:22 AM

ld64 starts doing this:

but should have read:

dyld starts doing this:

(Sorry for any confusion that this may have caused @MaskRay.) It is my understanding that there is no change in the static linker, only to the dynamic linker. I'm quite certain that we will not be reverting the change to the dyld as it is both required and has been shipping for the past year.

If dyld will not revert the change, I think there is no choice but to allow the workaround. Note that the monitor mechanism was introduced by Apple in D48446.

I am more concerned that whether this dyld change will constrain future sanitizer design space, though.

Short of using XFAIL I have little choice than to set the MH_WEAK_DEFINES bit by including a weak symbol in the dylib. The weak symbol could well be the so-called "strong" symbol itself, but the optics of making a symbol weak in order to replace a different weak symbol may be somewhat counterintuitive when holding onto the "strong-replaces-weak" notions. That said, and as we discussed above, in the context of dynamic loading, it's not "strong-replaces-weak", it's "ignore-weak-attribute-and-use-first-encounter. (This is true with Darwin as it is with Linux and other dynamic loaders). With the clarification that this requirement is only for dyld on Darwin, would the conditional injection of a weak symbol on Apple platforms be acceptable? (Or alternatively, marking the intended override symbol as "weak"?) Thank you.

It's nice if Mach-O uses the "ignore-weak-attribute-and-use-first-encounter" model like ELF. But now I am confused. If the executable is the first encountered, then the overriding seems to work.

rsundahl updated this revision to Diff 443547.Jul 10 2022, 8:15 PM

Apply weak attribute to target override. Apply change to all tests requiring it.

It's nice if Mach-O uses the "ignore-weak-attribute-and-use-first-encounter" model like ELF. But now I am confused. If the executable is the first encountered, then the overriding seems to work.

I would defer to @kledzik for an exact reason @MaskRay, but my presumption is that the program isn't considered during the weak-def-coalescing phase because the compilation unit, program or not, doesn't contain a weak symbol and so, isn't scanned for candidates.

Rather than to emit an arbitrary weak symbol foo, I have instead marked the symbol itself as weak (which accomplishes the same thing; to set the MH_WEAK_DEFINES bit in the header of the program) and I have made the change to all of the tests across the sanitizers which can be made to pass with this simple pattern. I prefer that this change be made upstream in the interest of minimizing platform divergence within sanitizers and ask for your trust that you can. accept this revision.

MaskRay added a comment.EditedJul 11 2022, 3:38 PM

It's nice if Mach-O uses the "ignore-weak-attribute-and-use-first-encounter" model like ELF. But now I am confused. If the executable is the first encountered, then the overriding seems to work.

I would defer to @kledzik for an exact reason @MaskRay, but my presumption is that the program isn't considered during the weak-def-coalescing phase because the compilation unit, program or not, doesn't contain a weak symbol and so, isn't scanned for candidates.

I'd still hope that dyld implementors may have a chance to read https://reviews.llvm.org/D127929#3624444 and previous comments.
The particular strong-definition-in-executable-override-weak-definition-in-dylib worked really well until new dyld decided to drop the functionality for unclear benefits.
If this is just this specific case, I am certainly ok.
I am more concerned of the newly imposed restriction on non-Windows systems.

Rather than to emit an arbitrary weak symbol foo, I have instead marked the symbol itself as weak (which accomplishes the same thing; to set the MH_WEAK_DEFINES bit in the header of the program) and I have made the change to all of the tests across the sanitizers which can be made to pass with this simple pattern. I prefer that this change be made upstream in the interest of minimizing platform divergence within sanitizers and ask for your trust that you can. accept this revision.

The latest revision uses a non-intuitive behavior that a weak definition can override a weak dylib definition while a strong definition can't.
This change is fine, if there are released Mach-O versions having such a dyld, but it'd be nice to fix newer dyld.

Consider dropping the unrelated part from debug_double_free.cpp (no need to appease clang-format lint for a not-formatted test file).

rsundahl updated this revision to Diff 444090.Jul 12 2022, 3:26 PM

Removed extraneous clang-format artifacts.

The latest revision uses a non-intuitive behavior that a weak definition can override a weak dylib definition while a strong definition can't.

...and the alternative of adding an arbitrary weak symbol so unrelated strong symbols are considered by the algorithm is equally awkward (but more honest). Neither solution is pleasant.

This change is fine, if there are released Mach-O versions having such a dyld, but it'd be nice to fix newer dyld.

I agree and will lobby the dyld team for accommodation.

Consider dropping the unrelated part from debug_double_free.cpp (no need to appease clang-format lint for a not-formatted test file).

Done.

MaskRay accepted this revision.Jul 12 2022, 4:05 PM

Last two nits

compiler-rt/test/asan/TestCases/default_options.cpp
14–15

Add a comment which version this fixes an issue. It's ok that which dyld version fixes the regression is unknown. Adding a version helps future readers check whether the regression has fall outside of support windows and can therefore be removed.

17

TBD: is not a conventional comment. Use TODO

This revision is now accepted and ready to land.Jul 12 2022, 4:05 PM

The latest revision uses a non-intuitive behavior that a weak definition can override a weak dylib definition while a strong definition can't.

...and the alternative of adding an arbitrary weak symbol so unrelated strong symbols are considered by the algorithm is equally awkward (but more honest). Neither solution is pleasant.

This change is fine, if there are released Mach-O versions having such a dyld, but it'd be nice to fix newer dyld.

I agree and will lobby the dyld team for accommodation.

Thanks:)

Consider dropping the unrelated part from debug_double_free.cpp (no need to appease clang-format lint for a not-formatted test file).

Done.

MaskRay retitled this revision from [ubsan] Add weak symbol for weak-def coalescing consideration. to [ubsan][test] Add weak symbol for weak-def coalescing consideration.Jul 12 2022, 4:06 PM

This works around the issue by adding an unused weak symbol.

The summary is now stale and should be updated.

vitalybuka added inline comments.Jul 12 2022, 4:30 PM
compiler-rt/test/asan/TestCases/default_options.cpp
17

should we instead add weak in the sanitizer/asan_interface.h
and include it here?

rsundahl retitled this revision from [ubsan][test] Add weak symbol for weak-def coalescing consideration to [ubsan][test] Add weak attributes to symbols for dyld weak-def coalescing of Mach-O files.Jul 12 2022, 4:31 PM
rsundahl edited the summary of this revision. (Show Details)
rsundahl added inline comments.Jul 12 2022, 6:13 PM
compiler-rt/test/asan/TestCases/default_options.cpp
17

Unfortunately, making the declaration weak is not enough. The definition itself must be weak for the linker to set MH_WEAK_DEFINES in the header.

rsundahl updated this revision to Diff 444356.Jul 13 2022, 11:43 AM

Final nits and suggestions.

rsundahl accepted this revision.Jul 13 2022, 11:44 AM
rsundahl marked 3 inline comments as done.
This revision was landed with ongoing or failed builds.Jul 13 2022, 11:52 AM
This revision was automatically updated to reflect the committed changes.