Page MenuHomePhabricator

[1a/3][ASan][compiler-rt] API for double ended containers
ClosedPublic

Authored by AdvenamTacet on Aug 17 2022, 6:05 PM.

Details

Summary

This revision is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector, to std::string and std::deque collections. These changes allow ASan to detect cases when the instrumented program accesses memory which is internally allocated by the collection but is still not in-use (accesses before or after the stored elements for std::deque, or between the size and capacity bounds for std::string).

The motivation for the research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison function). When object iter1 was longer than iter2, read out-of-bounds on iter2 could happen. Container sanitization would detect it.

This revision adds a new compiler-rt ASan sanitization API function sanitizer_annotate_double_ended_contiguous_container necessary to sanitize/annotate double ended contiguous containers. Note that that function annotates a single contiguous memory buffer (for example the std::deque's internal chunk). Such containers have the beginning of allocated memory block, beginning of the container in-use data, end of the container's in-use data and the end of the allocated memory block.
This also adds a new API function to verify if a double ended contiguous container is correctly annotated (__sanitizer_verify_double_ended_contiguous_container).

Since we do not modify the ASan's shadow memory encoding values, the capability of sanitizing/annotating a prefix of the internal contiguous memory buffer is limited – up to SHADOW_GRANULARITY-1 bytes may not be poisoned before the container's in-use data. This can cause false negatives (situations when ASan will not detect memory corruption in those areas).

On the other hand, API function interfaces are designed to work even if this caveat would not exist. Therefore implementations using those functions will poison every byte correctly, if only ASan
(and compiler-rt) is extended to support it. In other words, if ASan was modified to support annotating/poisoning of objects lying on addresses unaligned to SHADOW_GRANULARITY (so e.g. prefixes of those blocks), which would require changing its shadow memory encoding, this would not require any changes in the libcxx std::string/deque code which is added in further commits of this patch series.

If you have any questions, please email:
advenam.tacet@trailofbits.com
disconnect3d@trailofbits.com

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vitalybuka added inline comments.Aug 24 2022, 1:09 PM
compiler-rt/include/sanitizer/common_interface_defs.h
186–190

maybe just a single function?

void __sanitizer_annotate_de_contiguous_container(

const void *beg,
const void *end,
const void *old_con_beg,
const void *old_con_end,
const void *new_con_beg,
const void *new_con_end);
188

inconsistency
here order "new, old"
and below is "old, new" (I would prefer if we stick with existing this one)

214

Would be nice to find something better than _de_ in the name, double_ended ?

compiler-rt/lib/asan/asan_poisoning.cpp
479

maybe
beg_p -> storage_beg_p
end_p -> storage_end_p

500

this check and blow is hard to read, can we improve this
maybe

(beg <= con_end_p && con_end_p <= end) &&
(beg <= new_con_beg && new_con_beg <= con_end_p) &&
(beg <= old_con_beg_p && old_con_beg_p <= con_end_p)

512–513

instead of Min/max, probably better to simplify and sink them into corresponding branches below.

551

IsAligned

579

same with if restructuring.

vitalybuka added inline comments.Aug 24 2022, 1:16 PM
compiler-rt/lib/asan/asan_poisoning.cpp
385–390

to my taste this is CHECK() case, but no need to change in this patch
can simplify this in later (on in before) patch

vitalybuka requested changes to this revision.Aug 26 2022, 4:56 PM
This revision now requires changes to proceed.Aug 26 2022, 4:56 PM
AdvenamTacet marked 7 inline comments as done.

Changes based on the review.
Next update will contain:

  • test in compiler-rt/test/asan/TestCases/contiguous_container.cpp
  • argument order change
AdvenamTacet planned changes to this revision.Aug 29 2022, 8:15 PM
AdvenamTacet added inline comments.
compiler-rt/include/sanitizer/common_interface_defs.h
186–190

That was my first idea, but after some thinking I decided against it.

  • I see almost zero situations when one may want to change the beginning and the end at the same time, so there is no point.
  • It may be very small, but it may have a bad impact on performance.
  • It's less important, but it would be harder to remember the correct order and some mistakes may not be noticed instantly.
// Intended:
__sanitizer_annotate_de_contiguous_container(0x8, 0x1000, 0x10, 0x11, 0x99, 0x100);
// Mistake:
__sanitizer_annotate_de_contiguous_container(0x8, 0x1000, 0x10, 0x99, 0x11, 0x100);

I feel that it just complicates usage of those functions and brings no benefits, it's easier to just see _front and _back instead of checking what two arguments are the same.
One less function in API seems nice, but I suggest two.

Do you know any examples when changing the beginning and the end at the same time may be helpful?
Do you see any benefits, which I don't see?

188

I understand why suggested order may be confusing, so I will change it, but it is consistent with poisoning. (If we move beginning forward, we have to poison, if we move end forward, we have to unpoison). But I understand why having new/old in the same order also makes sense.

Maybe the best order is

void __sanitizer_annotate_de_contiguous_container_front(
                                                        const void *storage_beg,
                                                        const void *storage_end,
                                                        const void *old_con_beg,
                                                        const void *con_end,
                                                        const void *new_con_beg);

and in __sanitizer_annotate_de_contiguous_container_back:

const void *storage_beg,
const void *storage_end, 
const void *con_beg,
const void *old_con_end,
const void *new_con_end);

Instead of con_beg at the end?

What do you think?

Btw. I see additional _p in names, I will remove it with next update.

214

I didn't want to create very long names, but it's probably a good idea. I changed it.

compiler-rt/lib/asan/asan_poisoning.cpp
385–390

That function is not added by me, therefore I suggest leaving it for another patch. For consistency, I suggest leaving similar code in new functions and possibly changing them in a future patch.

But, I'm not sure how it's possible to use CHECK here, without decreasing feedback for users.

479

I changed it. I also changed _con_ -> _container_.

AdvenamTacet marked an inline comment as done.

Order of arguments changed, new order is:

  • storage_beg
  • storage_end
  • old_container_beg
  • old_container_end
  • new_value (based on function beg/end)
ldionne added a subscriber: ldionne.Sep 7 2022, 1:46 PM

Thanks a lot for the patch series. I've been independently considering changes to our so-called "debug mode" recently where std::vector & friends would have special iterators that keep track of the size of the container they point into, and can trap when dereferenced at an out-of-bounds address. This could be achieved by storing the bounds of the vector's allocation at the beginning of the allocation itself, and then those bounds-aware iterators would basically have a pointer within the allocation, and a pointer to the header containing that metadata. This would have the benefit that iterators have access to the bounds information as long as they are not invalidated by reallocating the vector. But the vector object itself (not the contiguous memory + header it points to) could move around without impacting the iterators.

Okay, so this is mostly unrelated to this patch because it concerns only contiguous containers so far. However, since this patch expands the intersection between the library and AddressSanitizer, perhaps it is worth discussing the pros and cons of each approach. In particular, the typical slowdown for using ASAN is documented as roughly 2x. The goal I was aiming for with the design I drafted above would be to have a smaller performance impact than ASAN, with the goal of hopefully being able to even turn it on in production in some scenarios.

Having obviously done a lot of work and thinking in the intersection of the library and sanitizers, do you have thoughts about this?

Hey @ldionne, thank you for your interest in my patches. I think your proposition is fundamentally different from ASan.

  • ASan is not designed to work on production and as far as I know, using it may increase attack surface.
  • With ASan, almost every memory access is instrumented, while with "smart iterators" only accesses with those iterators are checked.

My idea to evaluate if there is a point in implementing "smart iterators" is estimating the percentage of container overflows with iterators and raw pointers.
It may be possible with the oss-fuzz project tracker (https://bugs.chromium.org/p/oss-fuzz/issues/list?q=container%20overflow&can=1).
Also, [almost] every memory bug detected by "smart iterator" should be detected by ASan as well (if the container is annotated).
But "smart iterators" may work faster and in more situations.

However, I did not think about that kind of patch and therefore I do not have an opinion.

the typical slowdown for using ASAN is documented as roughly 2x

As you mentioned performance, I want to point out that my changes shouldn't noticeably change it. My code creates additional overhead only for functions modifying deques/strings. Every memory access to those containers is instrumented in the same way with and without my changes, rest of the program is not modified.

AdvenamTacet marked an inline comment as done.

Update to API: only one new function added instead of two.

Now there is:
void __sanitizer_annotate_double_ended_contiguous_container(

const void *storage_beg_p, const void *storage_end_p,
const void *old_container_beg_p, const void *old_container_end_p,
const void *new_container_beg_p, const void *new_container_end_p)

A new test (for double ended containers) is added to
compiler-rt/test/asan/TestCases/contiguous_container.cpp

AdvenamTacet added inline comments.Sep 19 2022, 6:50 PM
compiler-rt/include/sanitizer/common_interface_defs.h
186–190

I implemented a single function, order of arguments is same as suggested by you.
At the moment, it does use functions _front and _back as auxiliary functions, but API is extended only by one function. Also, Error classes / printing functions are updated to support new design.

I hope it's ok and potential improvements may be done in future patches. The API is extended only by one function.

vitalybuka added inline comments.Sep 21 2022, 3:36 PM
compiler-rt/include/sanitizer/common_interface_defs.h
229

looks too long, please clang format

230

new line

compiler-rt/lib/asan/asan_errors.h
346

-_front ?

libcxx/include/__config
883 ↗(On Diff #461436)

this should be in a separate patch, probably D132092

however we need to support the case when we build libcxx with older clang/compiler-rt when this function is not available
maybe you land D132090 and after some time, if it's not reverted
you land libcxx patch with _LIBCPP_CLANG_VER check?

how about gcc? it uses the same runtime

vitalybuka added inline comments.Sep 21 2022, 3:41 PM
libcxx/include/__config
883 ↗(On Diff #461436)

Note: I assume that compiler-rt/clang version mismatch is possible, but is not required to support. I assume that libcxx clang mismatch is required to be supported.

AdvenamTacet marked an inline comment as done.Sep 21 2022, 4:39 PM
AdvenamTacet added inline comments.
libcxx/include/__config
883 ↗(On Diff #461436)

this should be in a separate patch, probably D132092

Right, I'm going to move it to D132092.

however we need to support the case when we build libcxx with older clang/compiler-rt

We do use checks for clang version there (_LIBCPP_CLANG_VER >= 16000) and I am going to guard that as well. It does work with old versions of llvm.

how about gcc? it uses the same runtime

I do not know about using libc++ with gcc, I think we can wait with supporting annotations for deque there. If I understand correctly, there should be no problems generated by that change in that scenario, just no annotations for deque.

If you ask about libstdc++, annotations for that implementation have to be created. We do have working PoC, but we are focused on upstreaming those changes for llvm at the moment. I'm not sure if we are going to work further on libstdc++, but we may consider making them public "as is".

maybe you land D132090 and after some time, if it's not reverted

Landing D132090 and D132522 (I'm about to make an update there) before libcxx patches is my plan, but I want to land D132092 and D132769 soon after to avoid further changes in std::deque or std::basic_string implementations.

AdvenamTacet marked an inline comment as done.Sep 21 2022, 4:40 PM
AdvenamTacet marked 2 inline comments as done.

Clang format and moving libcxx part.

AdvenamTacet marked 2 inline comments as done.

Fixing naming and formatting.

Hey, I believe I fixed all issues from the code review. Is there anything else to change? What else should I do before upstreaming? I would be very happy to close it, to fully focus on revisions depending on it.

looks good in general, I will make another pass for details by tomorrow

compiler-rt/test/asan/TestCases/contiguous_container.cpp
82

we don't need version check as this code goes together with asan_poisoning.cpp and should not depend on compiler

88

I don't see TestDoubleEndedContainer is called

AdvenamTacet marked 2 inline comments as done.

Call to TestDoubleEndedContainer is now made.

Unnecessary check removed.

Annotations are fixed to pass the test and __sanitizer_verify_double_ended_contiguous_container
is modified to work with empty containers when beg != con_beg.

Locally, the test is failing on TestThrow, part not modified by that patch.

a.out: contiguous_container.cpp:112: void TestThrow(): Assertion `!__asan_address_is_poisoned(x + 14)' failed.
Aborted

Code there:

void TestThrow() {
  char x[32];
  __sanitizer_annotate_contiguous_container(x, x + 32, x + 32, x + 14);
  assert(!__asan_address_is_poisoned(x + 13));
  assert(__asan_address_is_poisoned(x + 14));
  ThrowAndCatch();
  assert(!__asan_address_is_poisoned(x + 13));
  assert(!__asan_address_is_poisoned(x + 14)); // Line 112

I'm going to look at it closer, but TestDoubleEndedContainer passed, therefore I believe
fixed implementation is working in every case (and not only for deque).

AdvenamTacet edited the summary of this revision. (Show Details)Sep 22 2022, 12:17 PM
vitalybuka requested changes to this revision.Sep 22 2022, 12:53 PM

Please confirm that you have no committer access.
If so after the next your update, I will cleanup nits, if any, accept and land the patch.

Please make sure that check-sanitizer and check-asan pass.

compiler-rt/lib/asan/asan_poisoning.cpp
478

both non interface "static void"

480

we don't use it old_container_end_p, new_container_end_p here,
only for reporting

lets's make non-interface functions all uptr params, remove unneded ones and move the following block into interface function

// Unchecked argument requirements:
// During unpoisoning memory of empty container (before first element is
// added):
// - old_container_end_p == old_container_beg_p
// During poisoning after last element was removed:
// - new_container_beg_p == container_end_p
if (!flags()->detect_container_overflow)
  return;
VPrintf(2, "de_contiguous_container (front): %p %p %p %p %p\n", storage_beg_p,
        storage_end_p, new_container_beg_p, old_container_beg_p,
        old_container_end_p);
uptr storage_beg = reinterpret_cast<uptr>(storage_beg_p);
uptr storage_end = reinterpret_cast<uptr>(storage_end_p);
uptr old_container_beg =
    reinterpret_cast<uptr>(old_container_beg_p);  // old container beginning
uptr old_container_end =
    reinterpret_cast<uptr>(old_container_end_p);  // container ending
uptr new_container_beg =
    reinterpret_cast<uptr>(new_container_beg_p);  // new container beginning
uptr new_container_end =
    reinterpret_cast<uptr>(new_container_end_p);  // new container ending

uptr granularity = ASAN_SHADOW_GRANULARITY;
if (!((storage_beg <= new_container_beg &&
       new_container_beg <= storage_end) &&
      (storage_beg <= old_container_beg &&
       old_container_beg <= storage_end) &&
      (old_container_beg <= old_container_end &&
       old_container_end <= storage_end) &&
      IsAligned(storage_beg, granularity) &&
      old_container_end_p == new_container_end_p)) {
  GET_STACK_TRACE_FATAL_HERE;
  ReportBadParamsToAnnotateDoubleEndedContiguousContainer(
      storage_beg, storage_end, old_container_beg, old_container_end,
      new_container_beg, new_container_end, &stack);
}
CHECK_LE(storage_end - storage_beg,
         FIRST_32_SECOND_64(1UL << 30, 1ULL << 40));  // Sanity check.
542

I assume after removing unneeded vars, we will have only old_container_end -> container_end

560

if either body or condition are multi-line, please wrap the body with {}

631

AddrIsAlignedByGranularity ,everywhere

641

new line after the function

645

we don't support !IsAligned(beg)
but please comment that we don't support !IsAligned(end) which shares the tail with something else

I assume we are not looking for intra object deque?

This revision now requires changes to proceed.Sep 22 2022, 12:53 PM

I guess not strictly insist, but maybe cleanup descriptions a little-bit, here and other reviews.
Usually LLVM commits descriptions are high level overview of what is done, and why it's done.

E.g.

Not needed in the git commit?

Trail of Bits developed this as part of a research project where
we tried to find bugs using the oss-fuzz
(using llvm-14 with our modifications) harnesses.
Now, we want to upstream the llvm-15 port of this work.

Naming does not match, in description please keep only interface function:

This commit adds new compiler-rt ASan sanitization API functions:
sanitizer_annotate_de_contiguous_container_front and
sanitizer_annotate_de_contiguous_container_back, necessary to
sanitize/annotate double ended contiguous containers.

Not needed in the git commit, as known and unchanged?

SHADOW_GRANULARITY is usually 8 – that value describes how
many bytes has a block described by a single value in the
AddressSanitizer's shadow memory and is set during llvm/clang
compilation.

Not needed assuming we have a link to review with the "review stack"?

Structure of our patches:
[1a/3][ASan][compiler-rt] API for double ended containers
[1b/3][ASan][compiler-rt] API for annotating objects memory
[2a/3][ASan][libcxx] std::deque annotations
[2b/3][ASan][libcxx] std::basic_string annotations
[3/3][ASan] integration test for std::deque and std::basic_string annotations

Quoted Text

AdvenamTacet marked 7 inline comments as done.

Fixes:

  • code style (suggested changes)
  • removes additional (non-interface) functions
  • AddrIsAlignedByGranularity used
  • Comment descriptions updated

I do not have commiter acces, I didn't contribute before.
If there is anything else I should do, please, let me know.

AdvenamTacet edited the summary of this revision. (Show Details)Oct 17 2022, 4:04 PM
vitalybuka edited the summary of this revision. (Show Details)

rebase, renames, comments

vitalybuka accepted this revision.Nov 21 2022, 1:59 PM

Something is broken in this implementation see D138459
Also it does not support unaligned container.

I have a draft of more generic implementation sanitizer_annotate_double_ended_contiguous_container and sanitizer_annotate_contiguous_container, so I land the patch as-is and switch to the new implementaion later.

vitalybuka removed a reviewer: Restricted Project.Nov 21 2022, 1:59 PM
This revision is now accepted and ready to land.Nov 21 2022, 1:59 PM
This revision was landed with ongoing or failed builds.Nov 21 2022, 4:46 PM
This revision was automatically updated to reflect the committed changes.

Something is broken in this implementation see D138459

Actually the D138459 is broken and this patch is correct.
But I will simplify, add unaligned case support and improve the test in D138773