This is an archive of the discontinued LLVM Phabricator instance.

[1b/3][ASan][compiler-rt] API for annotating objects memory
ClosedPublic

Authored by AdvenamTacet on Aug 23 2022, 6:51 PM.

Details

Reviewers
vitalybuka
Group Reviewers
Restricted Project
Restricted Project
Commits
rGdd1b7b797a11: [1b/3][ASan][compiler-rt] API for annotating objects memory
Summary

This revision is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector, to std::string and std::deque collections. These changes allow ASan to detect cases when the instrumented program accesses memory which is internally allocated by the collection but is still not in-use (accesses before or after the stored elements for std::deque, or between the size and capacity bounds for std::string).

The motivation for the research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison function). When object iter1 was longer than iter2, read out-of-bounds on iter2 could happen. Container sanitization would detect it.

This revision extends a compiler-rt ASan sanitization API function sanitizer_annotate_contiguous_container used to sanitize/annotate containers like std::vector to support different allocators and situations when granules are shared between objects. Those changes are necessary to support annotating objects' self memory (in contrast to annotating memory allocated by an object) like short std::basic_string (with short string optimization). That also allows use of non-standard memory allocators, as alignment requirement is no longer necessary.

This also updates an API function to verify if a double ended contiguous container is correctly annotated (__sanitizer_verify_contiguous_container).

If you have any questions, please email:
advenam.tacet@trailofbits.com
disconnect3d@trailofbits.com

Diff Detail

Event Timeline

AdvenamTacet created this revision.Aug 23 2022, 6:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 6:51 PM
Herald added subscribers: Enna1, dberris. · View Herald Transcript
AdvenamTacet requested review of this revision.Aug 23 2022, 6:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 6:51 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
AdvenamTacet added a reviewer: Restricted Project.Aug 23 2022, 7:14 PM

bool -> int

Removing bool type, because unit tests are failing.

Fixing argument alignment (code formating).

Can you please organize changes into a stack: "Edit related revisions" in Phabricator UI?

compiler-rt/lib/asan/asan_poisoning.cpp
389

We are looking into enabling StackStafety for asan (it's optimization)
which will make __sanitizer_annotate_contiguous_container not very useful for such short storages on the stack.

The problem with StackStafety that it assume that region of accessible memory of the stack object does not change over time, which is not true with __sanitizer_annotate_contiguous_container.

StackStafety will result in false negatives, when shadow set by __sanitizer_annotate_contiguous_container will not be checked.

So maybe investing into "short string" is not worth it?

395

Can we instead update __sanitizer_annotate_contiguous_container and friends to support this case

I guess we just need to load AddressIsPoisoned(beg-1) && AddressIsPoisoned(end) and make sure we do not poisoning them

vitalybuka requested changes to this revision.Aug 26 2022, 4:55 PM
This revision now requires changes to proceed.Aug 26 2022, 4:55 PM
AdvenamTacet requested review of this revision.Aug 29 2022, 6:39 PM

I stacked them, deque changes and string changes are independent.

compiler-rt/lib/asan/asan_poisoning.cpp
389

The motivation for the research and changes I am sending was a bug in real code where an out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators with a custom comparison function (so if iter1 was longer than iter2, we read out-of-bounds on iter2). While I agree this is a rare scenario, such a problem may still exist and so it may be beneficial to be able to detect it.

If we can't have both - annotating of short strings and StackSafety optimization, maybe we should have an option to enable one or another?

The work for annotating strings (both short and long including also the alternate string layout) has already been made. Removing just one case from it will generate more work.

395

No, it's not possible. Inside __sanitizer_annotate_contiguous_container there is not enough information. __sanitizer_is_annotable is necessary if object wants to poison self memory (not memory allocated by that object, but own memory), both on stack and heap.

Example when it's necessary (when alignment matters) is for example object in an array.

// SHADOW_GRANULARITY == 8
struct X { char buff[7];} // sizeof(X) == 7
X tab[2];

We cannot poison the first object in tab, because we will have to poison the first byte of the second object.

Therefore I believe that it's not possible to include it in __sanitizer_annotate_contiguous_container.

can you please organize all of them into a stack? It supports multiple deps like for 3.
1a -> 2a

\
 --> 3
/

1b -> 2b

vitalybuka added inline comments.Aug 30 2022, 8:32 AM
compiler-rt/lib/asan/asan_poisoning.cpp
395

address_p and size_v are for container?

this info is provided into __sanitizer_annotate_contiguous_container
Simples way is just to have

static bool IsAnotateble();

void __sanitizer_annotate_contiguous_container() {
  // instead of CHECK
   if (!IsAnotateble) return;
}

No, it's not possible. Inside __sanitizer_annotate_contiguous_container there is not enough information. __sanitizer_is_annotable is necessary if object wants to poison self memory (not memory allocated by that object, but own memory), both on stack and heap.

Example when it's necessary (when alignment matters) is for example object in an array.

// SHADOW_GRANULARITY == 8
struct X { char buff[7];} // sizeof(X) == 7
X tab[2];

We cannot poison the first object in tab, because we will have to poison the first byte of the second object.

Therefore I believe that it's not possible to include it in __sanitizer_annotate_contiguous_container.

AdvenamTacet added inline comments.Aug 30 2022, 3:07 PM
compiler-rt/lib/asan/asan_poisoning.cpp
395

I believe, data provided into __sanitizer_annotate_contiguous_container is not sufficient, as there we do not know if memory is a separated buffer for containers content or part of a bigger object.

Let's look at it closer. For that example, SHADOW_GRANULARITY == 8.
Here are __sanitizer_annotate_contiguous_container arguments and the function do not have any other source of information:

void __sanitizer_annotate_contiguous_container(const void *beg_p,
                                               const void *end_p,
                                               const void *old_mid_p,
                                               const void *new_mid_p)

If we have a vector with allocated buffer of size 126 bytes (42 elements of size 3), distance between enp_p and beg_p is 126 (and 126 % 8 != 0). That's ok and we can poison that memory, but inside __sanitizer_annotate_contiguous_container we do not have information if it's allocated buffer or part of a bigger structure.

If object wants to poison self memory, size of the object has to be aligned. So, we cannot poison short string if sizeof(str)==126 bytes, we can annotate long string with external buffer of size 126.

Also, moving that check into __sanitizer_annotate_contiguous_container would have a bad performance effect. In std::vector, my patch for std::deque and long string case, there is only compile-time check if a standard allocator is used, and therefore it's not necessary to confirm that beg_p is aligned, in some scenarios (like short string) there is no way to check it compile time.

vitalybuka added inline comments.Aug 30 2022, 3:30 PM
compiler-rt/lib/asan/asan_poisoning.cpp
395

I am not concerned about alignment check cost, we have function call and memory writes already. This function is quite expensive.
In the case of aligned end_p we continue to work as is.
if not aligned, you can read shadow(end_p) and determine what is just after the buffer
this is slightly inefficient

If we open to extend API, alternative is an another version of sanitizer_annotate_contiguous_container
e.g.
sanitizer_annotate_contiguous_container_unaligned() which never poisons unaligned tail/head granules

with __sanitizer_is_annotable() and sizeof(str)==126 you loose opportunity to check internal full granules.

can you please organize all of them into a stack? It supports multiple deps like for 3.

Patch number 3 ([3/3][ASan] integration test for std::deque and std::basic_string annotations) is not public yet, but I will try to make it public soon. I have to include changes for patch 1a, 1b there.
Then I will stack it as well.

compiler-rt/lib/asan/asan_poisoning.cpp
395

address_p and size_v are for container?

It's for the memory controlled by the programmer. In our case it's address of the object (this) and the size of the object (sizeof(*this)). It may be also used for variables inside the class.

Inside that memory, programmer is responsible to make sure that every variable is accessible.

A good example is a short string with alternate string layout. Metadata byte is at the very end of the object and therefore is always poisoned, we turn off annotations in functions accessing that byte. (Similarly for optimized copy constructors.)
We use __sanitizer_is_annotable to make sure that we won't poison memory outside of the string object, where access may not be controlled by us.

We can use a function like sanitizer_annotate_contiguous_container_unaligned(...) suggested by you, we would just pass the beginning and the end of the object as the beginning and the end of the buffer - it has the same functionality wrapped in one function, I believe. If you prefer that solution, let me know.
However, I still believe that function just for that check is a better choice as it provides a bigger flexibility.

Also, if we go with sanitizer_annotate_contiguous_container_unaligned, should we create similar functions for double_ended containers, for consistency? That will double number of functions.

with __sanitizer_is_annotable() and sizeof(str)==126 you loose opportunity to check internal full granules.

Sorry, I'm not sure if I understand. Do you mean that buffer which we want to poison is aligned (size and address) and it's in a bigger object?
Isn't it the same with __sanitizer_is_annotable and sanitizer_annotate_contiguous_container_unaligned (once we just carefully choose address and size, once we carefully choose two addresses)?

if not aligned, you can read shadow(end_p) and determine what is just after the buffer
this is slightly inefficient

It looks a little bit risky and requires a lot of thinking about edge cases. I'm honestly not sure if I know how to implement it.

vitalybuka added inline comments.Aug 30 2022, 4:59 PM
compiler-rt/lib/asan/asan_poisoning.cpp
395

I am not concerned about alignment check cost, we have function call and memory writes already. This function is quite expensive.
In the case of aligned end_p we continue to work as is.
if not aligned, you can read shadow(end_p) and determine what is just after the buffer
this is slightly inefficient

If we open to extend API, alternative is an another version of sanitizer_annotate_contiguous_container
e.g.
sanitizer_annotate_contiguous_container_unaligned() which never poisons unaligned tail/head granules

with __sanitizer_is_annotable() and sizeof(str)==126 you loose opportunity to check internal full granules.

395

Would prefer to do improve existing sanitizer_annotate_contiguous_container
note that you need to load unaligned shadow tail/head only when we about to poison tail/head granule.
so I am not concerned with perf.

AdvenamTacet added inline comments.Aug 30 2022, 5:54 PM
compiler-rt/lib/asan/asan_poisoning.cpp
395

Would prefer to do improve existing sanitizer_annotate_contiguous_container

I don't know how to do it without a new function in API.
I see the option I suggested or function like:

sanitizer_annotate_contiguous_container_unaligned(...) {
  if(!isAnnotable(...)) return;

  sanitizer_annotate_contiguous_container(...);
}

Possibly checking shadow memory around the buffer may be another solution, but I'm not sure how to implement it.
I'm quite sure that beg_p has to be aligned. If end_p is not aligned, possibly we can solve it with some ifs, but I'm not sure how exactly and what edge cases may happen.
If you have a clear idea, I will appreciate sharing details or implementation.

so I am not concerned with perf.

Ok, I'm not referring to performance in my previous answer.

AdvenamTacet marked an inline comment as not done.Aug 30 2022, 5:54 PM
AdvenamTacet added inline comments.Aug 31 2022, 9:26 AM
compiler-rt/lib/asan/asan_poisoning.cpp
395

I was thinking about it and I think I have a solution:

  • beg_p has to be aligned (otherwise container is not annotated at all),
  • if end_p is aligned, container is normally annotated,
  • if end_p is not aligned, last granule is annotated if and only if all accessible bytes inside that granule are inside the container.

I believe it should work.

But I'm a little bit scared that there is some case I didn't think of. (Is it possible that some implementation break?)
Let me know what you think about it. If you believe that solution is correct and you would accept it, I can implement it instead of a new function in API.

Future improvement (I suggest another patch for it):

  • First granule is not annotated when beg_p is not aligned, but all middle granules are. Therefore container may be partially annotated, even when beg_p is not aligned.
This comment was removed by AdvenamTacet.
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 6:40 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Previous push was designed for a different revision. Sorry for it.

Big change: now annotations work without changing API.

Annotation function is modified to support cases of not
aligned buffers and buffers in bigger structures.

That allows objets to annotate self memory.

AdvenamTacet marked 5 inline comments as done.Sep 21 2022, 7:02 PM
AdvenamTacet added inline comments.
compiler-rt/lib/asan/asan_poisoning.cpp
395

It seems that it works, I implemented it and updated the revision.

is this a bad rebase, I see stuff from D132090?

Correct revision update, previous files are for deque...

Is there any way to automate updates to not make mistakes writing numbers?
To avoid writing arc diff --update <id>?

Correct revision update, previous files are for deque...

Is there any way to automate updates to not make mistakes writing numbers?
To avoid writing arc diff --update <id>?

I usually do "arc diff HEAD^" and it uploads to whatever is in the git commit description

@vitalybuka yes, sorry for that. Now correct files are here. I think I have to change my workflow... now I have to be in a correct repo and write correct revision id in arc diff --update <id>, and clearly it's not easy for me. Sorry for the confusion.

I will try your method, thank you!

vitalybuka requested changes to this revision.EditedSep 22 2022, 10:57 AM

Please update

void TestContainer(size_t capacity, size_t off) {
  char *beg_alloc = new char[capacity + off];
  char *beg = beg_alloc;
  beg += off;

...

int main(int argc, char **argv) {
  int n = argc == 1 ? 128 : atoi(argv[1]);
  for (int i = 0; i <= n; i++)
    for (int j = 0; j < 8; j++)
      TestContainer(i, j);
  TestThrow();
}

As-is the patch fails check-asan

Please update compiler-rt/test/asan/TestCases/contiguous_container_crash.cpp

// RUN: %clangxx_asan -O %s -o %t
// RUN: not %run %t crash 2>&1 | FileCheck --check-prefix=CHECK-CRASH %s
// RUN: not %run %t bad-bounds 2>&1 | FileCheck --check-prefix=CHECK-BAD-BOUNDS %s
// RUN: not %run %t bad-alignment 2>&1 | FileCheck --check-prefix=CHECK-CRASH %s
// RUN: not %run %t bad-alignment-end 2>&1 | FileCheck --check-prefix=CHECK-CRASH %s
// RUN: %env_asan_opts=detect_container_overflow=0 %run %t crash
//
// Test crash due to __sanitizer_annotate_contiguous_container.

#include <assert.h>
#include <string.h>

extern "C" {
void __sanitizer_annotate_contiguous_container(const void *beg, const void *end,
                                               const void *old_mid,
                                               const void *new_mid);
}  // extern "C"

static volatile int one = 1;

int TestCrash() {
  long t[100];
  t[60] = 0;
  __sanitizer_annotate_contiguous_container(&t[0], &t[0] + 100, &t[0] + 100,
                                            &t[0] + 50);
// CHECK-CRASH: AddressSanitizer: container-overflow
// CHECK-CRASH: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0
  return (int)t[60 * one];  // Touches the poisoned memory.
}

void BadBounds() {
  long t[100];
// CHECK-BAD-BOUNDS: ERROR: AddressSanitizer: bad parameters to __sanitizer_annotate_contiguous_container
  __sanitizer_annotate_contiguous_container(&t[0], &t[0] + 100, &t[0] + 101,
                                            &t[0] + 50);
}

int BadAlignment() {
  int t[100];
  t[60] = 0;
  __sanitizer_annotate_contiguous_container(&t[1], &t[0] + 100, &t[0] + 100,
                                            &t[1] + 50);
  return (int)t[60 * one];  // Touches the poisoned memory.
}

int BadAlignmentEnd() {
  int t[99];
  t[60] = 0;
  __sanitizer_annotate_contiguous_container(&t[0], &t[0] + 99, &t[0] + 99,
                                            &t[0] + 50);
  return (int)t[60 * one];  // Touches the poisoned memory.
}

int main(int argc, char **argv) {
  assert(argc == 2);
  if (!strcmp(argv[1], "crash"))
    return TestCrash();
  else if (!strcmp(argv[1], "bad-bounds"))
    BadBounds();
  else if (!strcmp(argv[1], "bad-alignment"))
    return BadAlignment();
  else if (!strcmp(argv[1], "bad-alignment-end"))
    return BadAlignmentEnd();
  return 0;
}
compiler-rt/lib/asan/asan_poisoning.cpp
389
void __sanitizer_annotate_contiguous_container(const void *beg_p,
                                               const void *end_p,
                                               const void *old_mid_p,
                                               const void *new_mid_p) {
  if (!flags()->detect_container_overflow) return;
  VPrintf(2, "contiguous_container: %p %p %p %p\n", beg_p, end_p, old_mid_p,
          new_mid_p);
  uptr beg = reinterpret_cast<uptr>(beg_p);
  uptr end = reinterpret_cast<uptr>(end_p);
  uptr old_mid = reinterpret_cast<uptr>(old_mid_p);
  uptr new_mid = reinterpret_cast<uptr>(new_mid_p);
  uptr granularity = ASAN_SHADOW_GRANULARITY;
  if (!(beg <= old_mid && beg <= new_mid && old_mid <= end && new_mid <= end)) {
    GET_STACK_TRACE_FATAL_HERE;
    ReportBadParamsToAnnotateContiguousContainer(beg, end, old_mid, new_mid,
                                                 &stack);
  }
  CHECK_LE(end - beg,
           FIRST_32_SECOND_64(1UL << 30, 1ULL << 40)); // Sanity check.

  if (old_mid == new_mid)
    return;  // Nothing to do here.

  // Handle misaligned end and cut it off.
  if (UNLIKELY(!AddrIsAlignedByGranularity(end))) {
    uptr end_down = RoundDownTo(end, granularity);
    // Either new or old mid must be in the granule to affect it.
    if (new_mid > end_down) {
      if (AddressIsPoisoned(end)) {
        *(u8 *)MemToShadow(end_down) = static_cast<u8>(new_mid - end_down);
      } else {
        // Something after the container - don't touch.
      }
    } else if (old_mid > end_down) {
      if (AddressIsPoisoned(end)) {
        *(u8 *)MemToShadow(end_down) = kAsanContiguousContainerOOBMagic;
      } else {
        // Something after the container - don't touch.
      }
    }

    if (beg >= end_down)
      return;  // Same granule.

    old_mid = Min(end_down, old_mid);
    new_mid = Min(end_down, new_mid);
  }

  // Handle misaligned begin and cut it off.
  if (UNLIKELY(!AddrIsAlignedByGranularity(beg))) {
    uptr beg_up = RoundUpTo(beg, granularity);
    uptr beg_down = RoundDownTo(beg, granularity);
    // As soon as we add first byte into container we will not be able to
    // determine the state of the byte before the container. So we assume it's
    // always unpoison.

    // Either new or old mid must be in the granule to affect it.
    if (new_mid < beg_up) {
      *(u8 *)MemToShadow(beg_down) = static_cast<u8>(new_mid - beg_down);
    } else if (old_mid < beg_up) {
      *(u8 *)MemToShadow(beg_down) = 0;
    }

    old_mid = Max(beg_up, old_mid);
    new_mid = Max(beg_up, new_mid);
  }

  if (old_mid == new_mid)
    return;

  uptr a = RoundDownTo(Min(old_mid, new_mid), granularity);
  uptr c = RoundUpTo(Max(old_mid, new_mid), granularity);
  uptr d1 = RoundDownTo(old_mid, granularity);
  // uptr d2 = RoundUpTo(old_mid, granularity);
  // Currently we should be in this state:
  // [a, d1) is good, [d2, c) is bad, [d1, d2) is partially good.
  // Make a quick sanity check that we are indeed in this state.
  //
  // FIXME: Two of these three checks are disabled until we fix
  // https://github.com/google/sanitizers/issues/258.
  // if (d1 != d2)
  //  CHECK_EQ(*(u8*)MemToShadow(d1), old_mid - d1);
  if (a + granularity <= d1)
    CHECK_EQ(*(u8*)MemToShadow(a), 0);
  // if (d2 + granularity <= c && c <= end)
  //   CHECK_EQ(*(u8 *)MemToShadow(c - granularity),
  //            kAsanContiguousContainerOOBMagic);

  uptr b1 = RoundDownTo(new_mid, granularity);
  uptr b2 = RoundUpTo(new_mid, granularity);
  // New state:
  // [a, b1) is good, [b2, c) is bad, [b1, b2) is partially good.
  PoisonShadow(a, b1 - a, 0);
  PoisonShadow(b2, c - b2, kAsanContiguousContainerOOBMagic);
  if (b1 != b2) {
    CHECK_EQ(b2 - b1, granularity);
    *(u8*)MemToShadow(b1) = static_cast<u8>(new_mid - b1);
  }
}
This revision now requires changes to proceed.Sep 22 2022, 10:57 AM
AdvenamTacet edited the summary of this revision. (Show Details)Oct 25 2022, 6:23 PM
AdvenamTacet marked an inline comment as done.

Suggestions from the code review implemented.

  • Updated annotation function to support non-aligned beginning and end.
  • Implemented tests for non-aligned buffers in compiler-rt/test/asan/TestCases/contiguous_container
    • objects before buffer with different sizes
    • objects after buffer with different sizes
    • objects accessible and poisoned
  • Updated and fixed compiler-rt/test/asan/TestCases/contiguous_container_crash.cpp as requested.
AdvenamTacet retitled this revision from [1b/3][compiler-rt][ASan] API for annotating objects memory to [1b/3][ASan][compiler-rt] API for annotating objects memory.Oct 26 2022, 6:31 AM

Simplify test

vitalybuka accepted this revision.Oct 27 2022, 11:28 PM
This revision was not accepted when it landed; it landed in state Needs Review.Oct 27 2022, 11:29 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
vitalybuka removed 1 blocking reviewer(s): Restricted Project.Oct 27 2022, 11:32 PM