This is an archive of the discontinued LLVM Phabricator instance.

[TySan] A Type Sanitizer (Runtime Library)
Needs ReviewPublic

Authored by fhahn on Apr 18 2017, 3:58 PM.

Details

Summary

This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

Diff Detail

Event Timeline

hfinkel created this revision.Apr 18 2017, 3:58 PM
hfinkel updated this revision to Diff 95651.Apr 18 2017, 4:00 PM

Upload the version of the patch that works on ppc64le.

pcc added a subscriber: pcc.Apr 19 2017, 10:05 AM
filcab added a subscriber: filcab.Apr 19 2017, 10:53 AM

I just did a very quick first pass. Looks good in general, but I have some nits.

Also: Can you add a test for ATOMIC UPDATE access type?

cmake/config-ix.cmake
500

clang enables it on FreeBSD too. I guess maybe enabling it on FreeBSD in the clang patch was a bug and you'd rather finish work on Linux first?
Or do you want to use the bots/keep testing on FreeBSD as you commit the parts, and expect FreeBSD to "just work" too?

lib/tbaasan/tbaasan.h
23

Can you move this include above?
When reading, one can't tell if the using directives above are required before including the header or not. We can probably just stick them there, if they're really needed (I'd rather they weren't (in general), but given how fundamental the uptr and u16 types are, I don't see having those using directives in a header as a problem)

30

Can these go inside namespace __tbaasan?

lib/tbaasan/tbaasan_interceptors.cc
198

We're using 8x the memory for shadow memory. Do these make a noticeable difference?

I just did a very quick first pass. Looks good in general, but I have some nits.

Thanks!

Also: Can you add a test for ATOMIC UPDATE access type?

I had forgotten about that. I can't add a test yet because Clang doesn't add TBAA on atomic instructions. I need to fix that too.

cmake/config-ix.cmake
500

I suspect it will just work. OTOH, I'll turn it off in Clang for now (pending confirmation that it functions properly).

lib/tbaasan/tbaasan.h
23

Currently, no. The included header assumes those types have been brought in. FWIW, I also generally don't like having using directives in headers, but I agree that it seems reasonable in this case (and essentially every sanitizer has using __sanitizer::uptr in a header).

30

Yes. Will do.

lib/tbaasan/tbaasan_interceptors.cc
198

No, but I figured that it can't hurt (I copied these from TSan). Hopefully, one day (in the not-so-distant future), we'll use less memory (if we have the runtime manage type registration, for example, I think we could easily bring this down to 2-3 bytes per byte). I can obviously take this out if you'd prefer.

hfinkel updated this revision to Diff 95839.Apr 19 2017, 3:43 PM

Updated per review comments.

hfinkel updated this revision to Diff 96137.Apr 21 2017, 7:19 AM
hfinkel retitled this revision from [TBAASan] A TBAA Sanitizer (Runtime Library) to [TySan] A Type Sanitizer (Runtime Library).
hfinkel edited the summary of this revision. (Show Details)

Rename TBAASanitizer -> TypeSanitizer

hfinkel updated this revision to Diff 96266.Apr 21 2017, 5:20 PM

Mark the types of globals in a generated ctor-called function (added a runtime test for this). memcpy/memmove should do the corresponding thing for the type shadow data.

kosarev added inline comments.
test/tysan/union-wr-wr.c
21 ↗(On Diff #96266)

Note the missing slash before n.

phi added a subscriber: phi.May 27 2017, 4:01 AM
hfinkel added inline comments.Aug 14 2017, 6:59 PM
test/tysan/union-wr-wr.c
21 ↗(On Diff #96266)

Indeed. Fixed.

hfinkel updated this revision to Diff 117621.Oct 3 2017, 8:45 PM

Rebased (and fixed address mappings for (big-Endian) ppc64).

wfh added a subscriber: wfh.Oct 25 2017, 12:00 PM

FWIW - I tried this with chromium base/ and it fails on base/allocator/tcmalloc/internal_logging.cc

FAILED: obj/base/allocator/tcmalloc/internal_logging.o
../../../../llvm/build/bin/clang++ -MMD -MF obj/base/allocator/tcmalloc/internal_logging.o.d -DNO_HEAP_CHECK -DV8_DEPRECATION_WARNINGS -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL -DCHROMIUM_BUILD -DFIELDTRIAL_TESTING_ENABLED -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DCR_CLANG_REVISION=\"315613-1\" -DCOMPONENT_BUILD -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DTCMALLOC_DONT_REPLACE_SYSTEM_ALLOC -I../../base/allocator -I../../third_party/tcmalloc/chromium/src/base -I../../third_party/tcmalloc/chromium/src -I../.. -Igen -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -Wno-builtin-macro-redefined -DDATE= -DTIME= -DTIMESTAMP= -funwind-tables -fPIC -pipe -B../../third_party/binutils/Linux_x64/Release/bin -pthread -fcolor-diagnostics -Xclang -mllvm -Xclang -instcombine-lower-dbg-declare=0 -no-canonical-prefixes -m64 -march=x86-64 -fno-omit-frame-pointer -g1 -gline-tables-only -gcolumn-info -fno-omit-frame-pointer -fsanitize=type -fsanitize-blacklist=../../tools/tysan/blacklist.txt -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Werror -Wall -Wno-unused-variable -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-undefined-var-template -Wno-nonportable-include-path -Wno-address-of-packed-member -Wno-unused-lambda-capture -Wno-user-defined-warnings -Wno-enum-compare-switch -Wno-tautological-unsigned-zero-compare -Wno-null-pointer-arithmetic -Wno-tautological-unsigned-enum-zero-compare -Wno-reorder -Wno-unused-function -Wno-unused-local-typedefs -Wno-unused-private-field -Wno-sign-compare -Wno-unused-result -O2 -fno-ident -fdata-sections -ffunction-sections -std=gnu++14 -fno-rtti -nostdinc++ -isystem../../buildtools/third_party/libc++/trunk/include -isystem../../buildtools/third_party/libc++abi/trunk/include --sysroot=../../build/linux/debian_jessie_amd64-sysroot -fno-exceptions -fvisibility-inlines-hidden -c ../../third_party/tcmalloc/chromium/src/internal_logging.cc -o obj/base/allocator/tcmalloc/internal_logging.o

Instruction does not dominate all uses!

%15 = load i64, i64* @__tysan_app_memory_mask
%1 = and i64 %0, %15

Instruction does not dominate all uses!

%16 = load i64, i64* @__tysan_shadow_memory_address
%3 = add i64 %2, %16

Instruction does not dominate all uses!

%15 = load i64, i64* @__tysan_app_memory_mask
%6 = and i64 %5, %15

Instruction does not dominate all uses!

%16 = load i64, i64* @__tysan_shadow_memory_address
%8 = add i64 %7, %16

Instruction does not dominate all uses!

%15 = load i64, i64* @__tysan_app_memory_mask
%11 = and i64 %10, %15

Instruction does not dominate all uses!

%16 = load i64, i64* @__tysan_shadow_memory_address
%13 = add i64 %12, %16

can provide more info on request... for now I'm just disabling tcmalloc and seeing how much further I can get...

CJ-Johnson commandeered this revision.Jan 2 2020, 8:10 AM
CJ-Johnson added a reviewer: hfinkel.
CJ-Johnson added a subscriber: CJ-Johnson.

After discussing things with Hal, I'm going to take over these diffs and try to update them to the new pass manager :)

jgorbe added a subscriber: jgorbe.Jun 18 2020, 6:29 PM
MTC added a subscriber: MTC.Aug 16 2021, 7:38 PM
fhahn updated this revision to Diff 418963.Mar 29 2022, 12:54 PM
fhahn edited the summary of this revision. (Show Details)
fhahn added a subscriber: fhahn.

I rebased the patch, but I wasn't able to test it yet as it would need porting to Darwin.

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 29 2022, 12:54 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
fhahn updated this revision to Diff 435512.Jun 9 2022, 5:32 AM

Fix cmake scaffolding, support Darwin, only set type info in interceptors after tysan got initialized.

With the current version of the patch stack, all tests expcept 2 pass on macOS X86.

fhahn updated this revision to Diff 440265.Jun 27 2022, 9:27 AM

Add some additional tests, rename .cc files -> .cpp

fhahn updated this revision to Diff 469271.Oct 20 2022, 10:11 AM

Rebase on top of current main, all tests now pass on X86 macOS

Enna1 added a comment.Nov 14 2022, 7:22 PM

Rebase on top of current main, all tests now pass on X86 macOS

Hi @fhahn, I send https://reviews.llvm.org/D137414 which fixed the build and tests on X86 Linux.

fhahn commandeered this revision.Nov 15 2022, 3:35 PM
fhahn added a reviewer: CJ-Johnson.

Commandeering after the recent updates to make review + follow-ups easier.

Matt added a subscriber: Matt.Dec 7 2022, 6:28 PM
fhahn updated this revision to Diff 502608.Mar 6 2023, 5:58 AM

rebased and removed MIPS and PPC from compiler-rt/lib/tysan/tysan_platform.h as it has not been tested.

I was able to build the LLVM and Clang components but not the Runtime part on top of commit 2708869801ae00f4681f6b2d9d69b25b3fce26b6:

[761/1099] /usr/lib/ccache/bin/x86_64-pc-linux-gnu-clang++  -I/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/..  -DNDEBUG -
O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches -Wreturn-type -ggdb3 -Wall -Wno-unused-parameter -std=c++17 -fcolor-diagnostics -m64 -MD -MT lib/ty
san/CMakeFiles/RTTysan_dynamic.x86_64.dir/tysan_interceptors.cpp.o -MF lib/tysan/CMakeFiles/RTTysan_dynamic.x86_64.dir/tysan_interceptors.cpp.o.d -o lib/tysan/CMakeFiles/R
TTysan_dynamic.x86_64.dir/tysan_interceptors.cpp.o -c /var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/tysan_interceptors.cpp
FAILED: lib/tysan/CMakeFiles/RTTysan_dynamic.x86_64.dir/tysan_interceptors.cpp.o 
/usr/lib/ccache/bin/x86_64-pc-linux-gnu-clang++  -I/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/..  -DNDEBUG -O2 -pipe -m
arch=native -fdiagnostics-color=always -frecord-gcc-switches -Wreturn-type -ggdb3 -Wall -Wno-unused-parameter -std=c++17 -fcolor-diagnostics -m64 -MD -MT lib/tysan/CMakeFi
les/RTTysan_dynamic.x86_64.dir/tysan_interceptors.cpp.o -MF lib/tysan/CMakeFiles/RTTysan_dynamic.x86_64.dir/tysan_interceptors.cpp.o.d -o lib/tysan/CMakeFiles/RTTysan_dyna
mic.x86_64.dir/tysan_interceptors.cpp.o -c /var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/tysan_interceptors.cpp
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/tysan_interceptors.cpp:207:22: error: use of undeclared identifier 'mmap64'; did you mean 'mmap'?
  INTERCEPT_FUNCTION(mmap64);
                     ^~~~~~
                     mmap
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/../interception/interception.h:278:71: note: expanded from macro 'INTERCEPT_FUNCTION'
# define INTERCEPT_FUNCTION(func) INTERCEPT_FUNCTION_LINUX_OR_FREEBSD(func)
                                                                      ^
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/../interception/interception_linux.h:35:35: note: expanded from macro 'INTERCEPT_FUNCTION_LINUX_OR_FREEBSD'
      (::__interception::uptr) & (func),          \
                                  ^
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/tysan_interceptors.cpp:79:21: note: 'mmap' declared here
INTERCEPTOR(void *, mmap, void *addr, SIZE_T length, int prot, int flags,
                    ^
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/tysan_interceptors.cpp:207:3: error: no member named 'real_mmap64' in namespace '__interception'; did you mean 'real_mmap'?
  INTERCEPT_FUNCTION(mmap64);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/../interception/interception.h:278:35: note: expanded from macro 'INTERCEPT_FUNCTION'
# define INTERCEPT_FUNCTION(func) INTERCEPT_FUNCTION_LINUX_OR_FREEBSD(func)
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/tmp/portage/sys-libs/compiler-rt-sanitizers-17.0.0_pre20230304/work/compiler-rt/lib/tysan/../interception/interception_linux.h:34:36: note: expanded from macro 'INTER

Happy to do this via email or similar if it's fine with you as it might be easier.

Full log at http://sprunge.us/gfnzM5.

Thanks @Enna1, I missed that, it works! TySan also works on a trivial example I've tested. Now trying it on real world programs, woo!

Enna1 added a comment.Mar 7 2023, 3:13 AM

Hi @fhahn , I have tested tysan locally, check-tysan passed on both Linux x86_64 and Linux AArch64.

But there is a small issue when build llvm compiler-rt with COMPILER_RT_DEBUG=ON and run check-tysan:
SanitizerTool: CHECK failed: tysan.cpp:122 "((0)) != (0)" (0x0, 0x0) (tid=288531)

Also if there is any thing I could help landing tysan, I’d be happy to volunteer :)

compiler-rt/lib/tysan/tysan.cpp
70 ↗(On Diff #502608)

%zu

122 ↗(On Diff #502608)

This line will be executed when TDA points to __tysan_v1_Simple_20C_2fC_2b_2b_20TBAA.
I suggest ether remove this DCHECK(0) or replace it with

if (Verbosity())
  Report("Reach root type descriptor\n");
173 ↗(On Diff #502608)

%llu

fhahn updated this revision to Diff 504259.Mar 10 2023, 12:34 PM
fhahn marked 3 inline comments as done.

Address comments, remove ifdef code for ppc and adjust shadow-memory used on ARM64 macOS.

compiler-rt/lib/tysan/tysan.cpp
70 ↗(On Diff #502608)

Should be fixed, thanks!

122 ↗(On Diff #502608)

Thanks for checking, I updated the code to remove the DCHECK and added a comment.

173 ↗(On Diff #502608)

Updated. thanks!

phi removed a subscriber: phi.Mar 11 2023, 2:36 PM

Also able to confirm the runtime tysan tests pass on linux arm64 and x64, and that I need https://reviews.llvm.org/D137414 to work around the build issue @thesamesam reported. Similar to the LLVM part of this, do you need a rubber stamp lgtm before submitting?