This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] Avoid memintrinsic calls inserted by the compiler
ClosedPublic

Authored by melver on May 22 2023, 1:51 PM.

Details

Summary

D135716 introduced -ftrivial-auto-var-init=pattern where supported.
Unfortunately this introduces unwanted memset() for large stack arrays,
as shown by the new tests added for asan and msan (tsan already had this
test).

In general, the problem of compiler-inserted memintrinsic calls
(memset/memcpy/memmove) is not new to compiler-rt, and has been a
problem before.

To avoid introducing unwanted memintrinsic calls, we redefine
memintrinsics as __sanitizer_internal_mem* at the assembly level for
most source files automatically (where sanitizer_common_internal_defs.h
is included).

In few cases, redefining a symbol in this way causes issues for
interceptors, namely the memintrinsic interceptor themselves. For such
source files we have to selectively disable the redefinition.

Other alternatives have been considered, but simply do not work well in
the context of compiler-rt:

  1. Linker --wrap: this does not work because --wrap only
	   applies to the final link, and would not apply when building
	   sanitizer static libraries.
  1. Changing references to memset() via objcopy: this may work,
	   but due to the complexities of the build system, introducing
	   such a post-processing step for the right object files (in
	   particular object files defining memset cannot be touched)
	   seems infeasible.

The chosen solution works well (as shown by the tests). Other libraries
have chosen the same solution where nothing else works (see e.g. glibc's
"symbol-hacks.h").

v4:

  • Add interface attribute to __sanitizer_internal_mem* declarations as well, as otherwise some compilers (MSVC) will complain.
  • Add SANITIZER_COMMON_NO_REDEFINE_BUILTINS to source files using C++STL, since this could lead to ODR violations (see added comment).

v3:

  • Don't use ALIAS() to alias internal_mem*() functions to __sanitizer_internal_mem*() functions, but just define them as ALWAYS_INLINE functions instead. This will work on darwin and windows.

v2:

  • Fix ubsan_minimal build where compiler decides to insert memset/memcpy: ubsan_minimal has work without RTSanitizerCommonLibc, therefore do not redefine the builtins.
  • Fix definition of internal_mem* functions with compilers that want the aliased function to already be defined before.
  • Fix definition of __sanitizer_internal_mem* functions with compilers more pedantic about attribute placement around extern "C".

Diff Detail

Event Timeline

melver created this revision.May 22 2023, 1:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 1:51 PM
Herald added subscribers: Enna1, dberris. · View Herald Transcript
melver requested review of this revision.May 22 2023, 1:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 1:51 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
melver updated this revision to Diff 524714.May 23 2023, 7:49 AM
melver edited the summary of this revision. (Show Details)
  • Add tests for asan and msan.
  • Fix new issues for asan and msan.
dvyukov accepted this revision.May 23 2023, 8:25 AM
dvyukov added a subscriber: dvyukov.

Marco mentioned offline that there is nolibc build, which is broken for asan/msan now. So we need to fix these as well.

Some more complex alternatives:

  • add compiler flag to emit only inline initialization
  • add compiler flag to emit calls to special __sanitizer_memset functions
  • add objcopy pass to sanitizer runtime builds to rename memcpy/memset calls

But these are more complex and elaborate, and this solution looks reasonable as well.

compiler-rt/test/asan/TestCases/Linux/check_memcpy.c
13

Since we use regexp, I would 1 line (callq|jmpq). Or have you copied this form from somewhere else?
This is also x86-specific, so can add REQUIRES directive.

This revision is now accepted and ready to land.May 23 2023, 8:25 AM

Vitaly, do you want to take a look as well?

melver updated this revision to Diff 525103.May 24 2023, 4:06 AM
melver marked an inline comment as done.
  • Simplify CHECK-NOT of tests.
  • Add REQUIRES for x86.

Hmm if possible I'd prefer not to add these holes into the runtime since they kinda defeat the purpose of adding the auto-var-init flag, and I think adding UNINITIALIZED until a test passes seems kinda fragile.

I think what we'd want is just a way to ensure compiler-emitted memset/cpy calls go through the internal/unsanitized versions of those functions. I think the cheapest way to do this might be through --wrap=memset as a link flag and compiler-rt defines something like:

int memset(...) {
  return internal_memset(...);
}
melver updated this revision to Diff 525767.May 25 2023, 1:09 PM
melver retitled this revision from [compiler-rt] Avoid memset() by marking large stack variable uninitialized to [compiler-rt] Avoid memset() calls inserted by the compiler.
melver edited the summary of this revision. (Show Details)

Rework solution and find compromise:

  • avoid attribute((uninitialized)) where possible by redefining
	  memset at the asm level.
  • use the attribute only where necessary (where redefining memset isn't
	  possible due to interceptors).

Hmm if possible I'd prefer not to add these holes into the runtime since they kinda defeat the purpose of adding the auto-var-init flag, and I think adding UNINITIALIZED until a test passes seems kinda fragile.

I think what we'd want is just a way to ensure compiler-emitted memset/cpy calls go through the internal/unsanitized versions of those functions. I think the cheapest way to do this might be through --wrap=memset as a link flag and compiler-rt defines something like:

int memset(...) {
  return internal_memset(...);
}

--wrap doesn't work because it only applies to the final link (doesn't work for static lib sanitizers), which means that it would also wrap the to-be-sanitized binary's memset calls, which is wrong. Furthermore, claiming the symbol __wrap_memset globally is precluding others from using --wrap (we already have interceptors, we don't need more interception pain).

I reworked the solution to a compromise. Unfortunately if a memset interceptor exists it's a little difficult to redefine memset though, so in those TUs we still need the attribute.

vitalybuka added inline comments.May 25 2023, 1:19 PM
compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary32.h
356 ↗(On Diff #525767)

if possible, can we rely on sanitizer_redefine_builtins.h and avoid UNINITIALIZED?

compiler-rt/lib/sanitizer_common/sanitizer_redefine_builtins.h
22

no, it's a problem is even without auto-var-init
if this trick works please make it always

also memcpy

vitalybuka requested changes to this revision.May 25 2023, 1:20 PM
This revision now requires changes to proceed.May 25 2023, 1:20 PM

also, isn't D135716 was NOOP as it has a typo?

compiler-rt/lib/sanitizer_common/sanitizer_redefine_builtins.h
22

no, it's a problem is even without auto-var-init
if this trick works please make it always

also memcpy

maybe even move this into _defs.h to make sure it's everywhere

vitalybuka added inline comments.May 25 2023, 1:44 PM
compiler-rt/CMakeLists.txt
520 ↗(On Diff #525767)

as I wrote, don't do this
if the trick works, please use it always

time to time w have to debug crashes caused by accidental interceptor from " = {}".

compiler-rt/lib/asan/asan_stack.h
36 ↗(On Diff #525767)

This is use of UNINITIALIZED as intended.
We don't want in initialize performance critical buffers.

compiler-rt/lib/ubsan/ubsan_diag.cpp
22 ↗(On Diff #525767)

please reorder in a separate patch

compiler-rt/test/asan/TestCases/Linux/check_memcpy.c
11

Please use stronger --implicit-check-not instead of -NOT

Some more complex alternatives:

  • add compiler flag to emit only inline initialization

compiler can emit, but there are passes that replace some common patterns with memset. Probably suppressible with some attributes.
For sanitizer instrumentation we insert tsan_memset, asan_memset... etc

  • add compiler flag to emit calls to special __sanitizer_memset functions

late pass to replace memset/memcpy with __internal_ like would be nice. But we never had a good motivation to fix this.

  • add objcopy pass to sanitizer runtime builds to rename memcpy/memset calls

But these are more complex and elaborate, and this solution looks reasonable as well.

It would be grate if D151152 works.

vitalybuka added inline comments.May 25 2023, 1:51 PM
compiler-rt/lib/sanitizer_common/sanitizer_redefine_builtins.h
22

no, it's a problem is even without auto-var-init
if this trick works please make it always

also memcpy

maybe even move this into _defs.h to make sure it's everywhere

Or I guess it's going to conflict with interceptors?

melver added inline comments.May 25 2023, 1:58 PM
compiler-rt/lib/sanitizer_common/sanitizer_redefine_builtins.h
22

Interceptors make difficult to work everywhere. In general anything that includes sanitizer_common_interceptors.inc and wants the memset/cpy interceptor.

But let me try a little harder. glibc uses this solution, but they say clearly it's a bad hack: https://codebrowser.dev/glibc/glibc/sysdeps/generic/symbol-hacks.h.html

Maybe all it takes is moving interceptors for memset/cpy to the end of the source file (given that asm("f1 = f2") just turns into ".set f1, f2" in asm).

vitalybuka added inline comments.May 25 2023, 2:20 PM
compiler-rt/lib/sanitizer_common/sanitizer_redefine_builtins.h
22

I guess we can move them into a separate cpp?
Like we do with sanitizer_signal_interceptors.inc

melver updated this revision to Diff 526033.May 26 2023, 5:40 AM
melver marked 9 inline comments as done.
melver retitled this revision from [compiler-rt] Avoid memset() calls inserted by the compiler to [compiler-rt] Avoid memintrinsic calls inserted by the compiler.
melver edited the summary of this revision. (Show Details)
  • Rework solution to redefine builtins almost everywhere.
melver updated this revision to Diff 526035.May 26 2023, 5:52 AM
  • Also check memmove in tests.
melver added inline comments.May 26 2023, 5:59 AM
compiler-rt/lib/asan/asan_stack.h
36 ↗(On Diff #525767)

Moved to separate patch.

compiler-rt/lib/ubsan/ubsan_diag.cpp
22 ↗(On Diff #525767)

Removed entirely.

vitalybuka accepted this revision.May 26 2023, 11:03 AM

Nice!

Not a strong request, but I assume D151552 has high chance of being reverted. To avoid conflicts, would be nice to land them with a couple of days of delay.

compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp
14

For consistently lets prefix sanitizer_common macros with SANITIZER_COMMON_
we don't follow that strictly, but it could be nice:
__NO_SANITIZER_REDEFINE_BUILTINS -> SANITIZER_COMMON_NO_REDEFINE_BUILTINS

This revision is now accepted and ready to land.May 26 2023, 11:03 AM
melver updated this revision to Diff 526152.May 26 2023, 12:03 PM
melver marked an inline comment as done.
  • Use SANITIZER_COMMON_NO_REDEFINE_BUILTINS for opt-out macro name.
  • Do not use asm() hack with old MSVC compilers (not supported there).
This revision was landed with ongoing or failed builds.May 31 2023, 2:50 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.May 31 2023, 8:42 AM

This doesn't build on mac: http://45.33.8.238/macm1/61839/step_4.txt

FAILED: stage2_unix/obj/compiler-rt/lib/sanitizer_common/sources.sanitizer_libc.o 
./bin/clang++ -MMD -MF stage2_unix/obj/compiler-rt/lib/sanitizer_common/sources.sanitizer_libc.o.d -o stage2_unix/obj/compiler-rt/lib/sanitizer_common/sources.sanitizer_libc.o -c ../../compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp  -I../../compiler-rt/lib -mmacos-version-min=10.10 -O3 -fdiagnostics-color -Wall -Wextra -Wno-unused-parameter -Wdelete-non-virtual-dtor -Wstring-conversion -no-canonical-prefixes -Werror=date-time -isysroot ../../sysroot/MacOSX.sdk -Wpoison-system-directories -Wcovered-switch-default -fno-builtin -gline-tables-only -fPIC -funwind-tables -fvisibility=hidden -Werror=thread-safety -Werror=thread-safety-reference -Werror=thread-safety-beta -std=c++17 -fvisibility-inlines-hidden -fno-exceptions -fno-rtti
../../compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp:109:5: error: aliases are not supported on darwin
  109 |     ALIAS(__sanitizer_internal_memcpy);
      |     ^
../../compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h:222:34: note: expanded from macro 'ALIAS'
  222 | # define ALIAS(x) __attribute__((alias(SANITIZER_STRINGIFY(x))))
      |                                  ^
../../compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp:111:5: error: aliases are not supported on darwin
  111 |     ALIAS(__sanitizer_internal_memmove);
      |     ^
../../compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h:222:34: note: expanded from macro 'ALIAS'
  222 | # define ALIAS(x) __attribute__((alias(SANITIZER_STRINGIFY(x))))
      |                                  ^
../../compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp:113:5: error: aliases are not supported on darwin
  113 |     ALIAS(__sanitizer_internal_memset);
      |     ^
../../compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h:222:34: note: expanded from macro 'ALIAS'
  222 | # define ALIAS(x) __attribute__((alias(SANITIZER_STRINGIFY(x))))
      |                                  ^
3 errors generated.

Please take a look, and revert for now if it takes a while to fix.

vitalybuka reopened this revision.May 31 2023, 9:09 AM
This revision is now accepted and ready to land.May 31 2023, 9:09 AM

Thanks for the revert!

If it's not too much trouble, if you could fold https://github.com/llvm/llvm-project/commit/3825910c7316cf62549bd31c503c48e7526adcc2 into this when relanding, that'd be great.

I'm seeing a different error (using lld-link) on win: http://45.33.8.238/win/79321/step_4.txt

lld-link: error: stage2_win_x64/obj/compiler-rt/lib/asan/asan_shared_library.asan_activation.obj: memcpy should not refer to special section 0
[2124/2134] CXX obj/clang/unittests/Tooling/ToolingTests.SourceCodeTest.obj
melver updated this revision to Diff 527341.Jun 1 2023, 2:52 AM
  • Fix darwin and windows by defining internal_X() aliases as ALWAYS_INLINE functions instead.
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 2:52 AM
This revision was landed with ongoing or failed builds.Jun 2 2023, 6:39 AM
This revision was automatically updated to reflect the committed changes.
melver reopened this revision.Jun 2 2023, 7:40 AM
This revision is now accepted and ready to land.Jun 2 2023, 7:40 AM
melver updated this revision to Diff 528453.Jun 5 2023, 8:24 AM
melver edited the summary of this revision. (Show Details)
  • Try to fix Windows builds.
  • Fix ODR violations.
This revision was landed with ongoing or failed builds.Jun 6 2023, 7:12 AM
This revision was automatically updated to reflect the committed changes.

Apparently this is expected because a similar exception exists for the tsan check_memcpy.c test: https://github.com/llvm/llvm-zorg/pull/38/files

This broke linking the sanitizers for mingw targets. Building now fails with errors like these:

[330/350] Linking CXX shared library lib/windows/libclang_rt.asan_dynamic-x86_64.dll
FAILED: lib/windows/libclang_rt.asan_dynamic-x86_64.dll lib/windows/libclang_rt.asan_dynamic-x86_64.dll.a
[...]
ld.lld: error: lib/interception/CMakeFiles/RTInterception.x86_64.dir/interception_linux.cpp.obj: memcpy should not refer to special section 0
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I see that this also was happening with clang-cl configurations, which was then fixed in caa2c1bacbd76c017ebbb4fd13861f0f66770299. I'll go ahead and push a change that changes _MSC_VER into _WIN32 there, to keep the same behaviour in mingw configs as in clang-cl configs.

melver added a comment.Jun 7 2023, 4:47 AM

This broke linking the sanitizers for mingw targets. Building now fails with errors like these:

[330/350] Linking CXX shared library lib/windows/libclang_rt.asan_dynamic-x86_64.dll
FAILED: lib/windows/libclang_rt.asan_dynamic-x86_64.dll lib/windows/libclang_rt.asan_dynamic-x86_64.dll.a
[...]
ld.lld: error: lib/interception/CMakeFiles/RTInterception.x86_64.dir/interception_linux.cpp.obj: memcpy should not refer to special section 0
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I see that this also was happening with clang-cl configurations, which was then fixed in caa2c1bacbd76c017ebbb4fd13861f0f66770299. I'll go ahead and push a change that changes _MSC_VER into _WIN32 there, to keep the same behaviour in mingw configs as in clang-cl configs.

That sounds reasonable, thanks.

Getting it to work on Windows (when built with Clang) would be nice, because ftrivial-auto-var-init is also available on Windows (AFAIK). But I've deferred for now, because it seems rather tricky (likely needs some changes to the inline asm?).

Getting it to work on Windows (when built with Clang) would be nice, because ftrivial-auto-var-init is also available on Windows (AFAIK). But I've deferred for now, because it seems rather tricky (likely needs some changes to the inline asm?).

Yes, it seems like those kinds of symbol aliases don't quite work as expected/desired on COFF.

jakubjelinek added a subscriber: jakubjelinek.EditedMon, Nov 20, 11:40 AM

Note, this doesn't really work on SPARC Solaris when using the Solaris assembler, see https://gcc.gnu.org/PR112563 for details.

c
extern "C" void *memcpy (void *, const void *, decltype (sizeof 0)) __asm ("__sanitizer_internal_memcpy");
extern "C" void *memmove (void *, const void *, decltype (sizeof 0)) __asm ("__sanitizer_internal_memmove");
extern "C" void *memset (void *, int, decltype (sizeof 0)) __asm ("__sanitizer_internal_memset");

would work if -fno-builtin flag wouldn't be used, but for some reason it is.

c
extern "C" {
  static void *memcpy (void *, const void *, decltype (sizeof 0)) __attribute__ ((weakref ("__sanitizer_internal_memcpy")));
  static void *memmove (void *, const void *, decltype (sizeof 0)) __attribute__ ((weakref ("__sanitizer_internal_memmove")));
  static void *memset (void *, int, decltype (sizeof 0)) __attribute__ ((weakref ("__sanitizer_internal_memset")));
}

seems to work somehow, but explicit calls to the mem* or __builtin_mem* functions seem to be at least on SPARC less optimized (indirect calls rather than direct). But I bet compiler-rt doesn't really call the functions directly but calls __sanitizer_internal_mem* instead.