This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cmake/
2
config-ix.cmake
-
lib/
-
CMakeLists.txt
-
tbaasan/
-
CMakeLists.txt
4
tbaasan.h
-
tbaasan.cc
-
tbaasan.syms.extra
-
tbaasan_flags.inc
2
tbaasan_interceptors.cc
-
tbaasan_platform.h
-
test/
-
CMakeLists.txt
-
tbaasan/
-
CMakeLists.txt
-
anon-ns.cpp
-
anon-same-struct.c
-
anon-struct.c
-
basic.c
-
char-memcpy.c
-
int-long.c
-
lit.cfg
-
lit.site.cfg.in
-
ptr-float.c
-
struct-offset.c
-
struct.c
-
union-wr-wr.c

Differential D32197

[TySan] A Type Sanitizer (Runtime Library)
Needs ReviewPublic

Authored by fhahn on Apr 18 2017, 3:58 PM.

Download Raw Diff

Details

Reviewers

kcc
chandlerc
kubamracek
• dberlin
hfinkel
CJ-Johnson

Summary

This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

Diff Detail

Event Timeline

hfinkel created this revision.Apr 18 2017, 3:58 PM

Herald added subscribers: mgorny, kubamracek, mcrosier, srhines. · View Herald TranscriptApr 18 2017, 3:58 PM

Upload the version of the patch that works on ppc64le.

kubamracek added a reviewer: kubamracek.Apr 18 2017, 4:10 PM

https://reviews.llvm.org/D32198 (LLVM)
https://reviews.llvm.org/D32199 (Clang)

pcc added a subscriber: pcc.Apr 19 2017, 10:05 AM

I just did a very quick first pass. Looks good in general, but I have some nits.

Also: Can you add a test for ATOMIC UPDATE access type?

cmake/config-ix.cmake
500	clang enables it on FreeBSD too. I guess maybe enabling it on FreeBSD in the clang patch was a bug and you'd rather finish work on Linux first? Or do you want to use the bots/keep testing on FreeBSD as you commit the parts, and expect FreeBSD to "just work" too?
lib/tbaasan/tbaasan.h
23	Can you move this include above? When reading, one can't tell if the `using` directives above are required before including the header or not. We can probably just stick them there, if they're really needed (I'd rather they weren't (in general), but given how fundamental the `uptr` and `u16` types are, I don't see having those `using` directives in a header as a problem)
30	Can these go inside `namespace __tbaasan`?
lib/tbaasan/tbaasan_interceptors.cc
198	We're using 8x the memory for shadow memory. Do these make a noticeable difference?

In D32197#730755, @filcab wrote:

I just did a very quick first pass. Looks good in general, but I have some nits.

Thanks!

Also: Can you add a test for ATOMIC UPDATE access type?

I had forgotten about that. I can't add a test yet because Clang doesn't add TBAA on atomic instructions. I need to fix that too.

cmake/config-ix.cmake
500	I suspect it will just work. OTOH, I'll turn it off in Clang for now (pending confirmation that it functions properly).
lib/tbaasan/tbaasan.h
23	Currently, no. The included header assumes those types have been brought in. FWIW, I also generally don't like having using directives in headers, but I agree that it seems reasonable in this case (and essentially every sanitizer has `using __sanitizer::uptr` in a header).
30	Yes. Will do.
lib/tbaasan/tbaasan_interceptors.cc
198	No, but I figured that it can't hurt (I copied these from TSan). Hopefully, one day (in the not-so-distant future), we'll use less memory (if we have the runtime manage type registration, for example, I think we could easily bring this down to 2-3 bytes per byte). I can obviously take this out if you'd prefer.

Updated per review comments.

Rename TBAASanitizer -> TypeSanitizer

Mark the types of globals in a generated ctor-called function (added a runtime test for this). memcpy/memmove should do the corresponding thing for the type shadow data.

kosarev added a subscriber: kosarev.May 11 2017, 3:22 AM

kosarev added inline comments.

test/tysan/union-wr-wr.c
21 ↗	(On Diff #96266)	Note the missing slash before `n`.

phi added a subscriber: phi.May 27 2017, 4:01 AM

hfinkel added inline comments.Aug 14 2017, 6:59 PM

test/tysan/union-wr-wr.c
21 ↗	(On Diff #96266)	Indeed. Fixed.

Rebased.

Rebased (and fixed address mappings for (big-Endian) ppc64).

Rebased.

FWIW - I tried this with chromium base/ and it fails on base/allocator/tcmalloc/internal_logging.cc

FAILED: obj/base/allocator/tcmalloc/internal_logging.o
../../../../llvm/build/bin/clang++ -MMD -MF obj/base/allocator/tcmalloc/internal_logging.o.d -DNO_HEAP_CHECK -DV8_DEPRECATION_WARNINGS -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL -DCHROMIUM_BUILD -DFIELDTRIAL_TESTING_ENABLED -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DCR_CLANG_REVISION=\"315613-1\" -DCOMPONENT_BUILD -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DTCMALLOC_DONT_REPLACE_SYSTEM_ALLOC -I../../base/allocator -I../../third_party/tcmalloc/chromium/src/base -I../../third_party/tcmalloc/chromium/src -I../.. -Igen -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -Wno-builtin-macro-redefined -DDATE= -DTIME= -DTIMESTAMP= -funwind-tables -fPIC -pipe -B../../third_party/binutils/Linux_x64/Release/bin -pthread -fcolor-diagnostics -Xclang -mllvm -Xclang -instcombine-lower-dbg-declare=0 -no-canonical-prefixes -m64 -march=x86-64 -fno-omit-frame-pointer -g1 -gline-tables-only -gcolumn-info -fno-omit-frame-pointer -fsanitize=type -fsanitize-blacklist=../../tools/tysan/blacklist.txt -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Werror -Wall -Wno-unused-variable -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-undefined-var-template -Wno-nonportable-include-path -Wno-address-of-packed-member -Wno-unused-lambda-capture -Wno-user-defined-warnings -Wno-enum-compare-switch -Wno-tautological-unsigned-zero-compare -Wno-null-pointer-arithmetic -Wno-tautological-unsigned-enum-zero-compare -Wno-reorder -Wno-unused-function -Wno-unused-local-typedefs -Wno-unused-private-field -Wno-sign-compare -Wno-unused-result -O2 -fno-ident -fdata-sections -ffunction-sections -std=gnu++14 -fno-rtti -nostdinc++ -isystem../../buildtools/third_party/libc++/trunk/include -isystem../../buildtools/third_party/libc++abi/trunk/include --sysroot=../../build/linux/debian_jessie_amd64-sysroot -fno-exceptions -fvisibility-inlines-hidden -c ../../third_party/tcmalloc/chromium/src/internal_logging.cc -o obj/base/allocator/tcmalloc/internal_logging.o